Machine learning, for all its benevolent potential to detect cancers and create collision-proof self-driving cars, also threatens to upend our notions of what’s visible and hidden. It can, for instance, enable highly accurate facial recognition, see through the pixelation in photos, and even—as Facebook’s Cambridge Analytica scandal showed—use public social media data to predict more sensitive traits like someone’s political orientation.
Those same machine-learning applications, however, also suffer from a strange sort of blind spot that humans don’t—an inherent bug that can make an image classifier mistake a rifle for a helicopter, or make an autonomous vehicle blow through a stop sign. Those misclassifications, known as adversarial examples, have long been seen as a nagging weakness in machine-learning models. Just a few small tweaks to an image or a few additions of decoy data to a database can fool a system into coming to entirely wrong conclusions.
Now privacy-focused researchers, including teams at the Rochester Institute of Technology and Duke University, are hoping to exploring whether that Achilles’ heel could also protect your information. “Attackers are increasingly using machine learning to compromise user privacy,” says Neil Gong, a Duke computer science professor. “Attackers share in the power of machine learning and also its vulnerabilities. We can turn this vulnerability, adversarial examples, into a weapon to defend our privacy.”
A Dash of Fake Likes
Gong points to Facebook’s Cambridge Analytica incident as exactly the sort of privacy invasion he hopes to prevent: The data science firm paid thousands of Facebook users a few dollars each for answers to political and personal questions and then linked those answers with their public Facebook data to create a set of “training data.” When the firm then trained a machine-learning engine with that dataset, the resulting model could purportedly predict private political persuasions based only on public Facebook data.
Gong and his fellow Duke researcher Jinyuan Jia wondered if adversarial examples could have prevented that breach of privacy. If changing only a few pixels in a photo can trick a machine-learning-trained image recognition engine into confusing a rabbit and a turtle, could adding or subtracting a few Facebook likes from someone’s profile similarly exploit machine learning’s weaknesses?
“We can always find adversarial examples that defeat them.”
Neil Gong, Duke University
To test that hypothesis, the Duke researchers used an analogous data set: reviews in the Google Play store. To mirror Cambridge Analytica, they collected thousands of ratings in Google’s app store submitted by users who had also revealed their location on a Google Plus profile. They then trained a machine-learning engine with that data set to try to predict the home city of users based only on their app ratings. They found that based only on those Google Play likes, some machine-learning techniques could guess a user’s city on the first try with accuracy as high as 44 percent
Once they’d built their machine-learning engine, the researchers tried to break it with adversarial examples. After tweaking the data a few different ways, they found that by adding just three fake app ratings, chosen to statistically point to an incorrect city—or taking revealing ratings away—that small amount of noise could reduce the accuracy of their engine’s prediction back to no better than a random guess. They called the resulting system “Attriguard” in a reference to protecting the data’s private attributes against machine-learning snoops. “With just a few changes, we could perturb a user’s profile so that an attacker’s accuracy is reduced back to that baseline,” Gong says.
The cat-and-mouse game of predicting and protecting private usre data, Gong admits, doesn’t end there. If the machine-learning “attacker” is aware that adversarial examples may be protecting a data set from analysis, he or she can use what’s known as “adversarial training”—generating their own adversarial examples to include in a training data set so that the resulting machine-learning engine is far harder to fool. But the defender can respond by adding yet more adversarial examples to foil that more robust machine-learning engine, resulting in an endless tit-for-tat. “Even if the attacker uses so-called robust machine learning, we can still adjust our adversarial examples to evade those methods,” says Gong. “We can always find adversarial examples that defeat them.”
To Wiretap a Mockingbird
Another research group has experimented with a form of adversarial example data protection that’s intended to cut short that cat-and-mouse game. Researchers at the Rochester Institute of Technology and the University of Texas at Arlington looked at how adversarial examples could prevent a potential privacy leak in tools like VPNs and the anonymity software Tor, designed to hide the source and destination of online traffic. Attackers who can gain access to encrypted web browsing data in transit can in some cases use machine learning to spot patterns in the scrambled traffic that allows a snoop to predict which website—or even which specific page—a person is visiting. In their tests, the researchers found that the technique, known as web fingerprinting, could identify a website among a collection of 95 possibilities with up to 98 percent accuracy.
Credit: Google News