Featuring more data engineering with SQL, Wheat Detection Kaggle Competition and research exploring demographic biases in facial recognition technology.
Welcome to My Week in AI! Each week this blog will have the following parts:
- What I have done this week in AI
- An overview of an exciting and emerging piece of AI research
Exploring New Types of Databases
This week’s focus has been more data engineering. I have finished the ‘Master SQL for Data Science’ learning pathway on LinkedIn Learning, and I found it very useful and interesting. I am planning to learn more about document databases (specifically MongoDB) and graph databases (specifically Neo4j) and am currently setting up my own projects on both of these platforms to dive deeper into their intricacies. Up next for my big data and data engineering learning is Hadoop and Kafka and more advanced Spark.
Experimenting with Object Detection
I’ve also been working on a project: the Kaggle Wheat Detection competition, essentially an object detection computer vision task. The aim is to identify individual wheat plant heads in images of agricultural fields and there are two main challenges associated with this: plants may be very dense and so identifying individual heads is difficult, and images may be blurred due to wind. My first step for this project was loading and cleaning data, and now I am trying some fast and rough transfer learning approaches to understand the baseline accuracy I can achieve. Further steps for improving accuracy might be augmenting the training data, training my own classifier, and using transfer learning to generate image embeddings as an input to my own classifier.
1. Natural Language Generation:
The Commercial State of the Art in 2020
2. This Entire Article Was Written by Open AI’s GPT2
3. Learning To Classify Images Without Labels
4. Becoming a Data Scientist, Data Analyst, Financial Analyst and Research Analyst
Biases in Facial Recognition Algorithms
This week’s research paper is in a field I touched on in my post from last week, ethics and fairness in AI. Some concern exists amongst local and federal government agencies about inconsistencies between different demographic groups when identifying faces using facial recognition technology. This has led to several US cities and other governing bodies banning the use of such technology in the public sphere.
The research I’m highlighting this week is by Garcia et al. and titled ‘The Harms of Demographic Bias in Deep Face Recognition Research’.¹ The authors showed that the accuracy of three state-of-the-art facial recognition algorithms dropped significantly for non-Caucasians compared to Caucasians, and for women compared to men. The algorithms they tested were VGGFace, DLib and FaceNet. The authors presented the results as the mean euclidean distances between the face embeddings of different subjects within each of the demographic categories tested. For example, they found that the mean euclidean distance between the embeddings for Caucasian Male samples was ~1.343, but only ~1.02 for Asian Females, when using the FaceNet algorithm. This means that Asian Females had more similar embeddings on average, and so were more likely to be misidentified. This difference of ~0.3 between the two demographic groups is significant relative to typical euclidean distance values for such embeddings.
After evaluating the facial recognition algorithms on data of different demographic groups, the researchers demonstrated the ease of performing a morphing attack on these algorithms in an automated border control (ABC) scenario. ABCs use facial recognition software to compare a person to the image on their travel document, and open immigration gates if the two are deemed a match. A morphing attack involves an accomplice and an imposter who work together to create a fake document with a combination of their two faces. Garcia et al. developed morphed images from a set of ‘accomplice’ images and tested these on the FaceNet algorithm. They found that attacks based on Asian faces worked 4.32% the time compared to 0.27% of the time for Caucasian faces. This is a practical illustration of the consequences that demographic bias in facial recognition technology can have.