So many of us have used different Facebook applications to see us aging, turned into rock stars, or applied festive make-up. Such waves of facial transformations are usually accompanied by warnings not to share images of your faces – otherwise, they will be processed and misused.
But how does AI use faces in reality? Let’s discuss state-of-the-art applications for face detection and recognition.
First, detection and recognition are different tasks. Face detection is the crucial part of face recognition determining the number of faces on the picture or video without remembering or storing details. It may define some demographic data like age or gender, but it cannot recognize individuals.
Face recognition identifies a face in a photo or a video image against a pre-existing database of faces. Faces indeed need to be enrolled into the system to create the database of unique facial features. Afterward, the system breaks down a new image intro key features and compares them against the information stored in the database.
First, the computer examines either a photo or a video image and tries to distinguish faces from any other objects in the background. There are methods that a computer can use to achieve this, compensating for illumination, orientation, or camera distance. Yang, Kriegman, and Ahuja presented a classification for face detection methods. These methods are divided into four categories, and the face detection algorithms could belong to two or more groups.
Knowledge-based face detection
This method relies on the set of rules developed by humans according to our knowledge. We know that a face must have a nose, eyes, and mouth within certain distances and positions with each other. The problem with this method is to build an appropriate set of rules. If the rules are too general or too detailed, the system ends up with many false positives. However, it does not work for all skin colors and depends on lighting conditions that can change the exact hue of a person’s skin in the picture.
The template matching method uses predefined or parameterized face templates to locate or detect the faces by the correlation between the predefined or deformable templates and input images. The face model can be constructed by edges using the edge detection method.
A variation of this approach is the controlled background technique. If you are lucky to have a frontal face image and a plain background, you can remove the background, leaving face boundaries.
For this approach, the software has several classifiers for detecting various types of front-on faces and some for profile faces, such as detectors of eyes, a nose, a mouth, and in some cases, even a whole body. While the approach is easy to implement, it is usually inadequate for face detection.
Feature-based face detection
The feature-based method extracts structural features of the face. It is trained as a classifier and then used to differentiate facial and non-facial regions. One example of this method is color-based face detection that scans colored images or videos for areas with typical skin color and then looks for face segments.
Haar Feature Selection relies on similar properties of human faces to form matches from facial features: location and size of the eye, mouth, bridge of the nose, and the oriented gradients of pixel intensities. There are 38 layers of cascaded classifiers to obtain the total number of 6061 features from each frontal face. You can find some pre-trained classifiers here.
Histogram of Oriented Gradients (HOG) is a feature extractor for object detection. The features extracted are the distribution (histograms) of directions of gradients (oriented gradients) of the image.
Gradients are typically large round edges and corners and allow us to detect those regions. Instead of considering the pixel intensities, they count the occurrences of gradient vectors to represent the light direction to localize image segments. The method uses overlapping local contrast normalization to improve accuracy.
Appearance-based face detection
The more advanced appearance-based method depends on a set of delegate training face images to find out face models. It relies on machine learning and statistical analysis to find the relevant characteristics of face images and extract features from them. This method unites several algorithms:
Eigenface-based algorithm efficiently represents faces using Principal Component Analysis (PCA). PCA is applied to a set of images to lower the dimension of the dataset, best describing the variance of data. In this method, a face can be modeled as a linear combination of eigenfaces (set of eigenvectors). Face recognition, in this case, is based on the comparing of coefficients of linear representation.
Distribution-based algorithms like PCA and Fisher’s Discriminant define the subspace representing facial patterns. They usually have a trained classifier that identifies instances of the target pattern class from the background image patterns.
Hidden Markov Model is a standard method for detection tasks. Its states would be the facial features, usually described as strips of pixels.
Sparse Network of Winnows defines two linear units or target nodes: one for face patterns and the other for non-face patterns.
Naive Bayes Classifiers compute the probability of a face to appear in the picture based on the frequency of occurrence of a series of the pattern over the training images.
Inductive learning uses such algorithms as Quinlan’s C4.5 or Mitchell’s FIND-S to detect faces starting with the most specific hypothesis and generalizing.
Neural networks, such as GANs, are among the most recent and most powerful methods for detection problems, including face detection, emotion detection, and face recognition.
Video Processing: Motion-based face detection
In video images, you can use movement as a guide. One specific face movement is blinking, so if the software can determine a regular blinking pattern, it determines the face.
Various other motions indicate that the image may contain a face, such as flared nostrils, raised eyebrows, wrinkled foreheads, and opened mouths. When a face is detected and a particular face model matches with a specific movement, the model is laid over the face, enabling face tracking to pick up further face movements. The state-of-the-art solutions usually combine several methods, extracting features, for example, to be used in machine learning or deep learning algorithms.
Face detection tools
There are dozens of face detection solutions, both proprietary and open-source, that offer various features, from simple face detection to emotion detection and face recognition.
Proprietary face detection software
Amazon Rekognition is based on deep learning and is fully integrated into the Amazon Web Service ecosystem. It is a robust solution both for face detection and recognition, and it is applicable to detect eight basic emotions like “happy”, “sad”, “angry”, etc. Meanwhile, you can determine up to 100 faces in a single image with this tool. There is an option for video, and the pricing is different for different kinds of usage.
They primarily operate in China, are exceptionally well funded, and are known for their inclusion in Lenovo products. However, bear in mind that its parent company, Megvii has been sanctioned by the US government in late 2019.
Face Recognition and Face Detection API (Lambda Labs) provides face recognition, facial detection, eye position, nose position, mouth position, and gender classification. It offers 1000 free requests per month.
Kairos offers a variety of image recognition solutions. Their API endpoints include identifying gender, age, facial recognition, and emotional depth in photo and video. They offer 14 days free trial with a maximum limit of 10000 requests, providing SDKs for PHP, JS, .Net, and Python.
Microsoft Azure Cognitive Services Face API allows you to make 30000 requests per month, 20 requests per minute on a free basis. For paid requests, the price depends on the number of recognitions per month, starting from $1 per 1000 recognitions. Features include age estimation, gender and emotion recognition, landmark detection. SDKs support Go, Python, Java, .Net, andNode.js.
Paravision is a face recognition company for enterprises providing self-hosted solutions. Face and activity recognition and COVID-19 solutions (face recognition with masks, integration with thermal detection, etc.) are among their services. The company has SDKs for C++ and Python.
Trueface is also serving enterprises, providing features like gender recognition, age estimation, and landmark detection as a self-hosted solution.
Open-source face detection solutions
Ageitgey/face_recognition is a GitHub repository with 40k stars, one of the most extensive face recognition libraries. The contributors also claim it to be the “simplest facial recognition API for Python and the command line.” However, their drawbacks are the latest release as late as 2018 and 99.38% model recognition accuracy, which could be much better in 2021. It also does not have REST API.
Deepface is a framework for Python with 1,5k stars on GitHub, providing facial attribute analysis like age, gender, race, and emotion. It also provides REST API.
FaceNet developed by Google uses Python library for implementation. The repository boasts of 11,8k starts. Meanwhile, the last significant updates were in 2018. The accuracy of recognition is 99,65%, and it does not have REST API.
InsightFace is another Python library with 9,2k stars in GitHub, and the repository is actively updating. The recognition accuracy is 99,86%. They claim to provide a variety of algorithms for face detection, recognition, and alignment.
InsightFace-REST is an actively updating repository that “aims to provide convenient, easy deployable and scalable REST API for InsightFace face detection and recognition pipeline using FastAPI for serving and NVIDIA TensorRT for optimized inference.”
OpenCV isn’t an API, but it is a valuable tool with over 3,000 optimized computer vision algorithms. It offers many options for developers, including Eigenfacerecognizer, LBPHFacerecognizer, or lpbhfacerecognition face recognition modules.
OpenFace is a Python and Torch implementation of face recognition with deep neural networks. It rests on the CVPR 2015 paper FaceNet: A Unified Embedding for Face Recognition and Clustering.
Face detection is the first step for further face analysis, including recognition, emotion detection, or face generation. However, it is crucial to all other actions to collect all the necessary data for further processing. Robust face detection is a prerequisite for sophisticated recognition, tracking, and analytics tools and the cornerstone of computer vision.
Originally posted on SciForce blog.