Welcome to My Week in AI! Each week this blog will have the following parts:
- What I have done this week in AI
- An overview of an exciting and emerging piece of AI research
Visualizing Data Interactively
I’ve spent this week working on a dashboarding project using Plotly and Dash. Dashboarding is new for me, and I have been excited to learn Dash whilst building something with real data. I have found Dash very easy to use and intuitive, mostly due to the fact that it is a declarative and reactive library. Similar to Altair, which I spoke about in Part 3, Dash is an all-Python library, making it very easy for data scientists like myself to develop interactive and aesthetically pleasing dashboards.
I’ve also spent some time at the Spark + AI Summit by Databricks, which started on Monday and is a conference that I have been greatly looking forward to. I am especially interested in the talks on productionizing machine learning models and on using MLflow, and I’ll be sharing my thoughts and reactions in next week’s blog.
1. Natural Language Generation:
The Commercial State of the Art in 2020
2. This Entire Article Was Written by Open AI’s GPT2
3. Learning To Classify Images Without Labels
4. Becoming a Data Scientist, Data Analyst, Financial Analyst and Research Analyst
Model Inversion Attacks with Generative Models
The major AI research event that took place over the last week was CVPR 2020 — the pre-eminent computer vision conference. There were many fascinating papers submitted as always, so I decided that the research I would present this week should be from that conference. One paper that caught my eye was ‘The Secret Revealer: Generative Model-Inversion Attacks against Deep Neural Networks’ by Zhang et al., a research team from China and the US¹. In this research, they demonstrated a novel model-inversion attack method, which was able to extract information about training data from a deep neural network.
Model-inversion attacks are especially dangerous with models that use sensitive data for training, for example healthcare data, or facial image datasets. The running example used in this research was a white-box attack on a facial recognition classifier, to expose the training data and recover the face images. The researchers’ method involved training GANs on public ‘auxiliary data,’ which in their example was defined as images in which the faces were blurred. This encourages the generator to produce realistic images. The next step was to use the trained generator to then recover the missing sensitive regions in the image; this step was framed as an optimization problem.
The researchers found that their method performed favorably in this task when compared with previous state-of-the-art methods. They also made two further empirical observations that I would like to reiterate. First, that models with high predictive power can be attacked with higher accuracy because such models are able to build a strong correlation between features and labels; this characteristic is exactly what is exploited in model inversion attacks. Second, differential privacy could not protect against this method of attack, as it does not aim to conceal the private attributes of the training data — it only obscures them with statistical noise. This raises questions about models that rely on differential privacy for information security.
Unsupervised Learning of 3D Objects from 2D Images
I also wanted to mention the Best Paper Award winner from CVPR 2020, ‘Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild,’ by Wu et al². They proposed a method of learning 3D objects from single-view images without any external supervision. The researchers centered their method on an autoencoder that draws information based on the depth, albedo, viewpoint and illumination of the input image. Many of the results they presented in their paper were noteworthy, and I highly recommend reading it and trying their demo (which is available on their Github page).