Implementation
Now that we have clarified the theory, we can jump straight into the implementation.
In our implementation we’re going to be using Sklearn, OpenCV, TensorFlow and Imagenet.
Directory Strcuture
├── 20170511-185253 (Facenet Model)
│ ├── 20170511-185253.pb
│ ├── model-20170511-185253.ckpt-80000.data-00000-of-00001
│ ├── model-20170511-185253.ckpt-80000.index
│ └── model-20170511-185253.meta
├── cls
│ └── my_classifier.pkl
├── data
│ ├── det1.npy
│ ├── det2.npy
│ └── det3.npy
├── faces
│ └── aligned photos atomatically generated from raw_photos
├── labelled_faces
│ └── folder cointaining name of that person
├── raw_faces
│ └── Add your group photos here
├── raw_faces_to_aligned_faces.py
├── making_classifier.py
├── ModelManagement.py
├── labeling_faces.py
├── detect_face.py
├── facenet.py
├── classifying_static_image.py
├── clustering_faces.py
├── combine_cluster_folder.py
Step 1: Identifying the faces.
MTCNN (Multi-task Cascaded Convolutional Networks)
The MTCNN algorithm works in three steps and use one neural network for each.
The first part is a proposal network: It predicts potential face positions and their bounding boxes like an attention network in Faster R-CNN. The result of this step is a large number of face detections and lots of false detections.
The second part uses images and outputs of the first prediction. It makes a refinement of the result to eliminate most of false detections and aggregate bounding boxes.
The last part refines even more the predictions and adds facial landmarks predictions (in the original MTCNN implementation).
Align the faces using MTCNN (Multi-task Cascaded Convolutional Neural Networks).
These methods identify, detect and align the faces by making eyes and bottom lip appear in the same location on each image. The detect_face function returns two variables, bounding boxes and points for them.
import detect_face #from FaceNet
minsize = 40 # minimum size of face
threshold = [0.6, 0.7, 0.7] # three steps’s threshold
factor = 0.709 # scale factor 0.709
margin = 44
with tf.Graph().as_default():
sess = tf.Session()
with sess.as_default():
pnet, rnet, onet = detect_face.create_mtcnn(sess,'./data')
bounding_boxes, _ = detect_face.detect_face(img, minsize, pnet, rnet, onet, threshold, factor)
Step 2: Cropping the faces
As we are only concerned with the bounding boxes. We will take bounding boxes of the images. Using those four points we will crop all the faces present in the image.
nrof_faces = bounding_boxes.shape[0]
det_multiple_faces = bounding_boxes[:, 0:4]
for i in nrof_faces:
det= det_multiple_faces[i,:]
det = np.squeeze(det)
bb_temp[0] = det[0]
bb_temp[1] = det[1]
bb_temp[2] = det[2]
bb_temp[3] = det[3]
Step 3: Clustering them together.
Clustering is a process of partitioning a set of data(here images) into a set of meaningful sub-classes, called cluster.
In this project, for clustering the images, we are using the cropped images and creating the image embeddings using ImageNet Model. Then, we are then clustering those image embeddings using sklearns KMeans clustering algorithms. We are making a new folder for each of the image cluster. So that, in the end of this process we will get a folder in which all the picture is of same person.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
import numpy as np
import argparse
import facenet
import detect_face
import os
import sys
import math
import pickle
from sklearn.svm import SVC
with tf.Graph().as_default():
with tf.Session() as sess:
datadir = './faces/'
dataset = facenet.get_dataset(datadir)
paths, labels=facenet.get_image_paths_and_labels(dataset)
modeldir = './20170511-185253/20170511-185253.pb'
facenet.load_model(modeldir)
images_placeholder = tf.get_default_graph().get_tensor_by_name("input:0")
embeddings = tf.get_default_graph().get_tensor_by_name("embeddings:0")
phase_train_placeholder = tf.get_default_graph().get_tensor_by_name("phase_train:0")
embedding_size = embeddings.get_shape()[1]
batch_size = 25
image_size = 160
nrof_images = len(paths)
nrof_batches_per_epoch = int(math.ceil(1.0 * nrof_images / batch_size))
emb_array = np.zeros((nrof_images, embedding_size))
for i in range(nrof_batches_per_epoch):
start_index = i * batch_size
end_index = min((i + 1) * batch_size, nrof_images)
paths_batch = paths[start_index:end_index]
images = facenet.load_data(paths_batch, False, False, image_size)
feed_dict = {images_placeholder: images, phase_train_placeholder: False}
emb_array[start_index:end_index, :] = sess.run(embeddings, feed_dict=feed_dict)
rint(emb_array, paths)
Step 4: Removing the Noise.
During the process of face cropping and alignment, I generated a lot of false faces. We need to delete those false faces to improve our overall workflow.
Here, in the left side of fig 6, we can that there are overall 3 clusters, each depicting false image, men and women. When I deleted the noisy images I got only two major clusters, depicting men and women as shown in right side of fig 6.
During the process of clustering, I clustered the images in 9 sub-parts. The power of clustering is that it clustered all the noisy images together in the same subpart. This made the overall process very easy. So to remove the noisy images, I deleted the complete folder containing a noisy image which was generated during the time of clustering (as depicted in the GIF). We will do it several times until we remove all the noisy images.
We can see the drastic difference in Fig.7 and Fig.8 in the number of noisy images. If we look closely in the top-center part of Fig.7 we can see those are not faces rather the false faces.
Step 5: Labeling Images.
When we are repeatedly doing the Step 4, we start getting the similar faces in same clusters. This process reduced the tedious task of labelling each of the images rather we can label each sub-cluster with the name of the person that sub-cluster contains. In this, we will end with many folders inside which we will have images of that particular person.
Step 6: Making a classifier.
I was having multiple folders whose name corresponds to the name of the person whose image is stored in that particular folder.
So using a folder name as the labels and images as the example inputs, we will make a classifier. We will use the already cropped and aligned images and input them inside the ImageNet Model. The ImageNet model will then create the image embedding of all the faces. We then use Sklearns – SVM classifier to make a classifier.
When given the input image, we will then first identify all the faces in the image and then using the above classifier we will identify the person name as present above.
Finally, much availed “The Results” !! 🎉
Credit: Source link