We had a task to detect garbage trucks on video however popular datasets like COCO don’t include classes for garbage truck. In this post I will show how to create own dataset for object detection with own classes, train YOLOv3 model on this dataset and test it on some images and videos.
Choosing CNN model
We have studied benchmarks and results of experimental comparison of different models for object detection. Here is a good comparison of SOTA models. SSD with MobileNet provides the best accuracy and speed tradeoff, but has problems with detecting small objects. Faster R-CNN ensembled with ResNet and Inception ResNet has shown high accuracy on detecting small objects but it has lowest frame rate over models.
So we decided to use YOLOv3 as a good trade-off. Moreover there is plenty of articles on internet providing steps on using YOLOv3 model for object detection.
Preparing training dataset
To prepare own training dataset for object detection we can scrape images from open sources like Google, Flickr etc and label them.
Here we consider images from Google. It is quite tricky to parse Google Images. I used parser from here and made one modification to run Selenium in headless mode:
from selenium.webdriver.chrome.options import Options
options = Options()
Also be careful with chromedriver.exe. I installed in on my system:
sudo apt-get install chromium-chromedriver
and specified system path to it when initialized
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")
You can find my parser script here.
Once it’s done we can explore data to check whether we are comfort with data obtained. During exploration we can notice that we have a lot of dirty samples: some images have very small size (less than 200 px on smaller side, while input size for YOLOv3 is 256×256). Handsome samples for keywords ‘bus’ and ‘tractor’ present toys and cartoon pictures.
I have implemented a script for filtering out small images.
For labeling images we have used labelMe tool. Install tool following this guide and open labelMe from terminal.
Open directory with images of one class in labelme using button Open Dir
You can move between images using Next Image / Prev Image buttons or in File List panel.
Let’s draw a bounding box for object. Choose tool Create Rectangle in menu item Edit
Start move cursor changing size of rectangle. Once you are happy with the rectangle release mouse and you will get a popup for label
1. Microsoft Azure Machine Learning x Udacity — Lesson 4 Notes
2. Fundamentals of AI, ML and Deep Learning for Product Managers
3. Roadmap to Data Science
4. Work on Artificial Intelligence Projects
Enter name of class in the text field and confirm it. This label will appear in label list on right hand side of the main window.
Once all bounding boxes are created for all objects in the image click Ctrl+S to save result. Result will be saved in json file named by image file.
Now we need to convert json to format YOLOv3 can work with. YOLOv3 works with labels in format
class_number <x_center> <y_center> <width> <height>
Box coordinates (x_center, y_center, width, height) must be in normalized xywh format (from 0–1) i.e. divided by image width and height respectively. You can use my script from github.
Commonly we need to resize training images to the size detection model accepting. For YOLOv3 it is 256×256. Here we will use Darknet YOLOv3 model which performs resize itself so we don’t need to resize images.
There is also an observation that the more width/height/ratio different (in training and testing datasets) — the worse it detect. We can apply data augmentation (source).
Training YOLOv3 Darknet model
Now we can train YOLOv3 model. I used steps from here. Clone repository darknet and compile it
git clone https://github.com/pjreddie/darknet.git
We can save trained weights in file. Weights are saved for each 100th batch on default, but we can change this. Edit file examples/detector.c and change the condition at line 138.
We can tweak parameters in yolov3-obj.cfg file: batch size, max_batches, subdivisions. The GPU will process batch / subdivisions number of images at any time.
Input training images are first resized to size width x height before training. Default values are 416×416. We can improve results if we increase it to 608×608, but it would take longer to train too.
When using large learning rate we can observe spike in loss i.e. loss suddenly starts going up at some point.
It’s recommended to stop training when loss has values 0.xxx and no more changing (source).
Training YOLOv3 on Google Cloud virtual machine without GPU can take several days (roughly one batch per hour). On Google Colab with GPU we can get enormous speedup completing 1000 batches in around 40 minutes.
We can display loss results over training. Let’s use this git repo. First run training with output to log.txt file
./darknet detector train ../data/obj.data cfg/yolo-obj.cfg ../data/darknet53.conv.7 2>&1 > log.txt
The clone githut repo:
git clone https://github.com/Jumabek/darknet_scripts.git
And run script
python plot_yolo_log.py ../darknet/log.txt
Here is my loss curve for jitter value 0.5
Better results for parameter jitter 0.7
Testing detection on YOLOv3 with trained weights
First download this testing script and try it on pretrained weights.
Download weights from here and try detection with these weights on some image:
python opencv_yolo_detector.py --image <image>.jpg --config cfg/yolov3.cfg --weights yolov3.weights --names ./data/coco.names
Here we specify yolov3.cfg file, pretrained weights and class names for COCO dataset.
If everything works on pretrained weights we can try our weights for custom dataset.
We can also try detection on video. I have created a script for detecting vehicles on video from file. Here is my script for testing object detection on video. To run it use command
python video_yolo_detector.py --weights <yolo_trained_weights_file>.weights --config cfg/yolo-obj.cfg --names <path to obj.names> --video <path to video>
Once detection is complete result will be saved in file result.avi.
That’s it. Enjoy object detection with YOLOv3.