19th-century technological evolution offers a lesson for us
London, more specifically Buckingham Palace.
Queen Victoria and her husband Prince Albert had an hobby.
When they had the chance, they used to entertain themselves by doing some drawings of their relatives and other domestic subjects.
These illustrations were so well-made that Prince Albert decided to print them privately to give as a present to friends and kin.
Trending AI Articles:
1. Making a Simple Neural Network
2. From Perceptron to Deep Neural Nets
3. Neural networks for solving differential equations
4. Turn your Raspberry Pi into homemade Google Home
Somehow, the catalog of these etchings came into the hands of William Strange, a Londoner, printer and publisher. Without asking any permission to Prince Albert, he published the catalog.
Once the case was brought before the Court of Chancery, the Lord Chancellor Cottenham observed that there was a “breach of trust” in Strange’s behavior towards Prince Albert.
In other words, Strange used the materials in ways he shouldn’t have.
This case is considered as a milestone in the evolution of the concepts of privacy and copyright.
Ironically, one hundred and seventy years later, privacy and copyright infringement, in some way, still go hand in hand.
Intellectual property in both case — still today there are not etchings anymore, but HD digital portraits under Creative Commons license.
In fact, especially in Western societies, when some new technological developments step in, aiming new ways to doing stuff, sometimes as consequence, sooner or later, we also experienced the rise of new rights and regulations exactly where those technologies operated.
That was true for the publishing industry during its flourishing in the 19th century, and therefore, in the same period, we experienced in many countries a strong development of copyright.
As sensationalistic newspapers became popular and snap cameras were more accessible to the general public in the United States, back in 1890 Warren and Brandeis developed the right of privacy for the very first time starting from remarks like this:
“instantaneous photographs and newspaper enterprise have invaded the sacred precincts of private and domestic life.”
If only they could see Facebook and Instagram…
So, what about nowadays?
Previously on this article I showed how an artificial intelligence project was trained with special categories of personal data from European people such as biometric data from portraits of human faces in training set. Even though no consent was to be collected from these people because the project was done for scientific purposes (always assuming, of course), the authors did not release the measures taken in conducting the research in respect of GDPR, as required by its art.89.
The thing is — that’s not the only legal violation beneath the page thispersondoesnotexist.com.
The GitHub project page in fact reports that “only images under permissive licenses were collected”. Indeed, in the same page you can take a look at the Metadata file, which includes all information for each image such as author name, Flickr URL, license details etc.
I don’t know how they massively downloaded 70,000 images from Flickr, but looking at the public folders in their Google Drive it seems like there were not so many rules. As long as the image is in HD, containing a human face and has a lax authorization, it’s a good fit for the project and it can be part of the dataset. In any case, if there is an issue, our toolkit will take care of it — right?
If only the world was that easy…
That’s because on Flickr, alongside profiles of users which simply upload their photo as it is, other users — like professional photographers, but sometimes also the amateur ones — upload pictures which contain distinctive signs and watermarks.
What is a watermark?
Do you remember when browsing Google Images you finally found a beautiful picture, you wanted it but it also had an annoying mark/the name of the author? That bothersome thing is a watermark. Some authors usually embed that on their own images just to let anyone know the ownership of their work — and no one should remove that. Here is an example on Flickr that was also used for the “StyleGAN” project.
In the 70,000 images in the dataset, as you imagine there are pictures of many kinds, including of course pictures with watermarks. Here other examples.
In the next screenshots, the first picture belongs to the original images on Flickr folder (named “in-the-wild-images”), the second one is from the folder (named “images1024x1024”) where the images from Flickr were already aligned and cropped, ready for be used.
Look at the “21113.png” file.
As you see, the problem is that the machine learning algorithm tool cropped them away. And, of course, the watermark is not seen anywhere in the final picture.
Other examples on other images from the dataset:
Look at “20768.png” file;
and now at “22779.png” file.
But wait, in our case all images from the dataset have permissive license, so who cares about cutting away watermarks?
In truth, by doing this you silently slip onto a dangerous ground.
As reported for example here, there are legal concerns with removing watermarks, even on CC license (like our case), because there is an high risk of copyright infringement.
I contacted the StyleGAN team about this issue. I asked them why they used also this kind of images for the training set and if they had special authorizations by the authors of those images for removing their watermarks, but I didn’t receive any answer.
There are many clues that it wasn’t paid enough attention to taking care of such personal data such as our faces, especially in a period like this of rising concern about this topic. This video of the project tells about “a new generator of that automatically learns to separate different aspects of the images without any human supervision”.
Indeed, paying an extra attention would be appreciated and, sometimes, required.