Computer vision is one of the most interesting domains within the artificial intelligence arena. It was mainly inspired to automate tasks that were mimicking human vision. Advances in deep learning, cheaper computing and storage cost gave computer vision techniques a big leap forward and the last decade saw a surge in many such techniques. These techniques provide state of the art solution for tasks like face detection, face recognition, object detection, image classification, image-based recommendations.
We as humans almost instantly recognize a person by having a glance at their face. In order to perform such a task, within computer vision, the person identification is divided into face detection followed by recognition. Its common applications for identification and authentication leading to automation for attendance, access controls systems, suspicious person identification, missing person identification. Social media could not have been left untouched by these advances. Be it you posting a photo on Facebook, and it, in turn, giving you suggestions for tagging your friends or Google Photos, becoming your assistant in grouping and labeling pictures from your gallery. It even compiles short videos from your series of vacation pictures. Face detection is widely used in mobile applications for virtual augmentation of human faces with cat face, dog face, ornaments, etc as can be seen in Snapchat.
Let’s see the topic that we will see in this blog post.
- Face Detection Techniques
- Haar Cascades Classifier: The first machine learning-based cascading classifier to fulfill the requirement of fast implementation on low-power CPUs, such as cameras and phones.
- Histogram of Oriented Gradients (HOG): Histogram of Oriented Gradient is used as features followed by linear Support Vector Machine classifier.
- Multi-task Cascaded Convolutional Networks (MTCNN): Deep learning-based face detection technique.
2. Face Alignment: This is also called face normalization which helps to improve face recognition accuracy
3. Face Recognition: Recognize the known person from an image against a repository of known person photos.
Haar Cascades Classifier
This revolutionary method for face detection was presented by Paul Viola and Michael Jones in 2001. It is the first machine learning-based cascading classifier to fulfill the requirement of fast implementation on low-power CPUs, such as cameras and phones. The classifier is trained with many ‘positive’ (with face in it) images and the same size of ‘negative’(no face in it) images. The figure below shows how the Haar features used to extract the information from the window is used to train the model and later prediction. They are like CNN kernels. Each feature is a single value obtained by subtracting the sum of pixels under the white rectangle from the sum of pixels under the black rectangle.
It concatenates several classifiers into one classifier. In extracting features, even with small window size, we will have a massive feature list and most of them are irrelevant. Adaboost is used to select the best features. At the end of each classifier, irrelevant features (reject sub-window) are removed and weights are updated. Cascading is continued until the required accuracy or error rate is achieved.
Implementation in code using OpenCV
OpenCV is an open-source image and video processing library. Install opencv-python library it on your virtual environment by using the pip comm and.
To try it out, copy and paste the below code to Jupyter notebook and change the image file name ‘sample.jpg’ to your input file name.
Remarks: User has to tune the three parameters: scaleFactor, minNeighbors, minSize for each image to reduce false negative. Hence it is not a fully intelligent system.
Histogram of Oriented Gradients (HOG)
This method is introduced by Dalal and Triggs in their paper Histogram of Oriented Gradients for Human detection in 2005. They used Histogram of Oriented Gradient features as image descriptor and trained Support Vector Machine (SVM) classifier to create highly accurate human detector (object classifiers). For any object, one can use HOG image descriptor for training and create object classifier. Here we will use it for face detection. The implementation pipeline is shown in the below figure.
Implementation in code
To implement this method we will use open source library face_recognition. It is a python wrapper of dlib library written in C++. This library is packed with pre-trained models of face detection and face recognition techniques. You can install the face_recognition library to your virtual environment by using the pip command.
To try it out, copy and paste the below code to Jupyter notebook and change the image file name ‘sample_2.jpg’ to your input file name
Remarks: Based on your application you can choose this detector. It is a good choice to use it for face recognition because it avoids half and covered faces. Due to the wrong prediction of landmarks, it is not possible to align face correctly.
face_recognition library also provides a deep learning-based pre-trained model for face detection called CNN. To use it, you can replace ‘hog’ keyword in detection to ‘cnn’. This detector is fast and accurate if you use GPU for prediction. You can try it on your own if have access to GPU.
This method is proposed by Kaipeng Zhang et al. in their paper, Joint Face Detection and Alignment using Multi-task cascaded convolution networks. In their deep learning approaches, they used cascading of Convolution Neural Networks of three stages to train and predict face and five landmark locations which help in face alignment. In the first step, the image is passed to the Proposal Network (P-Net). Using regression technique it predicts bounding boxes. After that, non-maximum suppression (NMS) is used to merge highly overlapped candidates. In the second step, previously output candidates are the input to another CNN, called Refine Network (R-Net). This CNN will also reject a large number of false candidates and predict more accurate bounding boxes and apply NMS to remove overlapped boxes. In the third step, Output Network (O-Net), is similar to the second stage, but this network will also predict five facial landmark’s position.
Generally, if you have used any deep learning models, they take fixed-size input but MTCNN takes any size of the color image as input.
Implementation in code
For this, we will use another open-source library MTCNN. You can install the mtcnn library to your virtual environment by using the pip command. To try it out, copy and paste the below code to Jupyter notebook and change the image file name ‘sample_2.jpg’ to your input file name.
Remarks: This model predicts half, covered and side faces.
Face alignment is used to improve the accuracy of face recognition. It is a normalized technique which outputs the face-centered to the image, rotated such that line joining the center of two eyes is parallel to the horizontal line and it resizes the faces to identical scale. Implementation in code Before trying this sample code make sure you have installed numpy in your virtual environment if it is not installed, install it using the pip comman d. To try it out, copy and paste the below code to Jupyter notebook. It will give an output of aligned and centered face for a given sample input face image.
In this step, we will use cropped and aligned image of the test face, instead of the whole image. We have used face_recognition library as it provides a pre-trained model based on ResNet architecture. This model takes face image and landmark as an input and gives 128 feature vectors in the output. To find the same person in a new image, find the euclidean distance between each faces in the new image with the known person image. The face with the minimum distance is the matching face.
To try it out, copy and paste the below code to Jupyter notebook. It will give distance value for a given sample input face image from a reference image
One can try to match faces with different expressions, plain glasses, and goggles, and see how the distance vary.
In this blog, Team Intellica has explained the various implementation techniques of face detection and face recognition using Artificial Intelligence Technology. This methodology can detect and recognize faces with high accuracy in real-time.
If you’re looking for similar tech competence or want to integrate face detection & recognition solution with your existing system; feel free to reach out at email@example.com