A complete pipeline for tracking pedestrians
YOLOv3 (You Only Look Once), is a model for object detection. The object detection task consists of determining the location on the image where certain objects are present, as well as classifying those objects. Previous methods for this, like R-CNN and its variations, used a pipeline to perform this task in multiple steps. This can be slow to run and also hard to optimize because each individual component must be trained separately. YOLOv3, does it all with a single neural network.
YOLO v3 makes prediction at three scales, which are precisely given by down-sampling the dimensions of the input image by 32, 16 and 8 respectively.
Manual analysis of pedestrians and crowds is often impractical for massive datasets of surveillance videos. Automatic tracking of humans is one of the essential abilities for computerized analysis of such videos. Pedestrian tracking and counting is extremely significant research in the field of computer vision. It plays a crucial role in many applications including intelligent monitoring and traffic safety. Nevertheless, some challenges such as great variation of the pedestrian posture, background clutter, partial occlusions, and illumination changing complicate the issue.
For people tracking we would start with all possible detections in a frame and give them an ID. In subsequent frames we try to carry forward a person’s ID. If the person has moved away from the frame then that ID is dropped. If a new person appears then they start off with a fresh ID.
This is a difficult task since people could look similar causing the model to switch IDs, people may get occluded as in when a pedestrian or player gets hidden behind someone else or objects may disappear and reappear in later frames.
If you are searching for someone who is lost in a natural disaster or is stuck in some remote location.
2. Infrastructure Planning
A lot of businesses or government agencies could use people counter to understand various things like how crowded are public places at a given time or how many people are using a particular street crossing every day etc.
Monitoring people in crowded places like shopping malls, railway station, tourist sites etc using CCTV cameras.
Organizing merchandise in aisles, optimizing store layout, understand peak times and potentially even protect against theft in retail stores.
In this algorithm, tracking is based on not just distance, velocity but also what that person looks like. Deep sort allows us to add this feature by computing deep features for every bounding box and using the similarity between deep features to also factor into the tracking logic.
Let’s get started with the code. I have used google colab for this project. Feel free to check out the corresponding notebook here.
I started with setting up the Deep Sort algorithm from pzq’s github repository. Feel free to check it out here. Also I imported libraries required for this project.
Next I downloaded the yolo weights from my google drive. Feel free to download the weights and use it locally.
I continued with importing some other libraries and and by setting the yolo weights with the help of deep_sort package.
Next I grabbed a video from Active Vision Laboratory of Oxford University. I converted the video to mp4 using ffmpeg. Let’s also display the sample image using IPython display.
With the help of deep_sort I used four variables to keep track of the four end co-ordinates of the bounding box. To filter out the correct bounding boxes I used a threshold to track pedestrians in real time.
Finally, I called the function to run pedestrian tracking on.
- If the bounding boxes are too big than too much of background is “captured” in the features reducing the effectiveness of the algorithm.
- If people are dressed similarly as happens in sports that can result in similar features and ID switching.
There are many opportunities in pedestrian tracking, both in unseen applications and in new methods for pushing state of the art results. Even though this was just a general overview of pedestrian tracking using YOLO and Deep Sort algorithm, I hope it gives you a basic understanding and a baseline for getting deeper knowledge.
The corresponding notebook can be found here.
Happy reading, happy learning and happy coding.
If you want to keep updated with my latest articles and projects follow me on Medium. These are some of my contacts details: