Object Detection service with YOLO and FastAPI
Deploy custom-trained machine learning model that detects objects in images
In my previous machine learning articles 1, 2, 3, I covered an important task in computer vision, image classification. Image classification is one application in computer vision. In this article, I want to explore “object detection” task and deploy the machine learning model with FastAPI, so it can be consumed by restful API interface.
The following article provides a good overview of object detection.
YOLO is one type of deep neural network commonly used in “object detection” task, which is very fast. The latest version of YOLO is v7. We will use YOLO in this article.
COCO dataset format
In “object detection” world, COCO (which stands for Common Objects In Context) dataset is a dataset format used for object detection research. On high-level, it is using JSON to do image annotation, it needs to specify the image location and other basic information, mostly importantly, it annotate the bounding boxes in each image with corresponding object category (e.g. which location in the image is apple, which location is banana, etc.). The following is a sample partial COCO JSON, notice annotations[].image_id refers to actual images[].id, annotations[].category_id refers to categories[].id, so you know which annotation refers to which bounding box/location in which image and the category of the bounding box.
For more information, refer to the following articles about COCO format.
Use code in the link to download sample COCO dataset
You can use pycocotools to parse COCO dataset in Python. Below are some samples to parse annotations and show as image like this article.
# instance with bounding boxes and category text
import pandas as pd
import os
from pycocotools.coco import COCO
import skimage.io as io
import matplotlib.pyplot as plt
from pathlib import PathdataDir=Path('coco/images/val2017')
annFile = Path('coco/annotations/instances_val2017.json')
coco = COCO(annFile)
imgIds = coco.getImgIds()
imgs = coco.loadImgs(imgIds[-3:])_,axs = plt.subplots(len(imgs),2,figsize=(10,5 * len(imgs)))
for img, ax in zip(imgs, axs):
I = io.imread(dataDir/img['file_name'])
annIds = coco.getAnnIds(imgIds=[img['id']])
anns = coco.loadAnns(annIds)
ax[0].imshow(I)
ax[1].imshow(I)
plt.sca(ax[1])
coco.showAnns(anns, draw_bbox=True)
for i, ann in enumerate(anns):
cat = coco.loadCats(anns[i]['category_id'])
cat_name = cat[0]['name']
ax[1].text(anns[i]['bbox'][0], anns[i]['bbox'][1], cat_name, style='italic',
bbox={'facecolor': 'white', 'alpha': 0.7, 'pad': 5})# person key points
dataDir=Path('coco/images/val2017')
annFile = Path('coco/annotations/person_keypoints_val2017.json')
coco = COCO(annFile)
imgIds = coco.getImgIds()
imgs = coco.loadImgs(imgIds[-1:])
_,axs = plt.subplots(len(imgs),2,figsize=(10,5 * len(imgs)))
for img, ax in zip(imgs, axs):
I = io.imread(dataDir/img['file_name'])
annIds = coco.getAnnIds(imgIds=[img['id']])
anns = coco.loadAnns(annIds)
ax[0].imshow(I)
ax[1].imshow(I)
plt.sca(ax[1])
coco.showAnns(anns, draw_bbox=False)
Model inference with Yolov5 and fastapi on Google Colab
When you run fastapi API on Google Colab, you want to expose the endpoint to public, so that you can access from Internet. Previously you can use ngrok as described in this article, and this colab notebook. However, seems not ngrok requires you to register account first before using it. Also tried this colab notebook, but python minimal_server.py just hangs. Finally found colab-xterm, which solved the problem, and you can run multiple xterms in parallel (e.g. server and client), after enabling third-party cookies.
!pip install colab-xterm
%load_ext colabxterm
Then run
%xterm
I followed repo below
Quick recap of FastAPI server code, which is a Python-based fast (high-performance), web framework for building APIs.
Typical when each time you hit FastAPI endpoint hosting a machine learning model, the following happen:
- Based on your routing setup, your request will be routed by FastAPI into the function annotated by the route
- Your code then use the model (the model is typically loaded globally, to avoid loading it every time on a request)
- Use model predict function or do a forward pass of the model to get prediction result
- Convert the result into JSON
Main uvicorn server code, tells server to listen at localhost:8000 and run app instance which is FastAPI instance.
from fastapi import FastAPI
app = FastAPI()# Python main entrypointif __name__ == '__main__':
import uvicorn # APP/app_str - The ASGI application to run, in the format "<module>:<attribute>" app_str = 'server_minimal:app'
uvicorn.run(app_str, host='localhost', port=8000, reload=True, workers=1)
Main object detection API endpoint. FastAPI uses decorator to tell “/” and HTTP post combination should be routed to process_home_form function, the function can take the passed-in filestream and model name and get prediction result, then convert to JSON format required by default, and send back.
@app.post("/")
async def process_home_form(file: UploadFile = File(...),
model_name: str = Form(...)):
'''
Requires an image file upload, model name (ex. yolov5s).
Returns: json response with list of list of dicts.
Each dict contains class, class_name, confidence, normalized_bbox
Note: Because this is an async method, the YOLO inference is a blocking
operation.
'''model = torch.hub.load('ultralytics/yolov5', model_name, pretrained=True, force_reload = False)#This is how you decode + process image with PIL
results = model(Image.open(BytesIO(await file.read()))) json_results = results_to_json(results,model)
return json_results
If you look at the client code, you will see it is posting tohttp://localhost:8000, passing model name and image file stream.
import requests as r
import json
from pprint import pprintdef send_request(image = '../images/zidane.jpg', model_name = 'yolov5s'):
res = r.post("http://localhost:8000",
data={'model_name': model_name},
files = {'file': open(image , "rb")} #pass the files here
)pprint(json.loads(res.text))if __name__ == '__main__':
send_request()
YOLOv5 object detection restful API server.
Object detection api client
In real-world, you will build a container image with necessary dependencies (e.g. Python runtime, Python packages needed by FastAPI (e.g. fastapi, uvicorn) and YOLO (e.g. pytorch), machine learning model). Below is a sample Dockerfile
YOLO transfer learning/custom training
You also want to train your own model based on your own data and start with pre-trained YOLO model. YOLO5 repo provides a tutorial. To use that, you need to provide the data in the following format.
- images and labels folder
- For corresponding images, it needs to have .txt in labels folder with same name, e.g. data/images/train/000000000009.jpg needs to have data/labels/train/000000000009.txt including the annotations
- The label file format: each line has five number, the first is category id, others denote bounding box, each file can have multiples lines for multiple annotations
0 0.35750000000000004 0.53875 0.463334 0.39499999999999996
- Training configuration file to provide information for data and labels, e.g. in the following, you have a path for the data set, the relative path of the training dataset, validation and test dataset, and list of the classes in the names
path: ../datasets/coco128 # dataset root dir
train: images/train2017 # train images (relative to 'path') 128 images
val: # val images (relative to 'path') 128 images
test: # test images (optional)# Classes
names:
0: person
1: bicycle
2: car
3: motorcycle
4: airplane
YOLOv5 also separates backbone (which extracts image feature) with head layer. Generally you want to keep those layers’ weights untouched during transfer learning. There is a “freeze” parameter to keep backbone layers frozen. The basic command to train a custom data set is
python train.py --img <image size> --batch <batch size> --epochs <epoch> --data <training config file/coco128.yaml> --weights <pretrained model> --freeze <number of layers to freeze>
Use custom-trained model
python detect.py --weights runs/train/exp2/weights/best.pt --img 640 --conf 0.25 --source data/images
While for a quick-start project, FastAPI is ok to deploy machine learning model. In large-scale production workload, a mature model serving framework is needed. That will be in another article.
Appendix
Tool to work with COCO format
sahi Python package can generate COCO format (actually it is a lightweight vision library for object detection)
YOLO
History of YOLO: old versions of YOLO is based on Darknet, but from v5, most implementations are based on PyTorch.
Deploy machine learning model with FastAPI
Object detection using language other than Python
YOLO transfer learning
PyTorch lightning-based YOLO may not be mature yet. Only found some related posts.