The Commonwealth Scientific and Industrial Research Organisation’s (CSIRO) Data61 developed an artificial intelligence (AI)-based system for the Australian Federal Police (AFP) to analyse potentially harmful data without the need to view it.
Working alongside the AFP and Monash University, Data61 developed the Data Airlock platform that uses AI and machine learning to scan through and filter confronting images faster than previous methods, while also keeping analytics secure and restricted.
Read more: AFP analysing sensitive data without accessing the unpleasant truth
“Data Airlock focuses on three key principles: Protecting people from data, protecting data from people, and analysing sensitive data in a safe and secure manner,” Data61’s Dr Surya Nepal, Board Sector Manager on the Data Airlock development task force, said.
According to Data61, interest in Data Airlock has extended to the Department of Home Affairs, New South Wales Police, and Australian Institute of Health and Welfare (AIHW).
As a result, researchers from Data61 are working to adapt the program to their specific needs.
Within the next 12 months, Data61 said it plans to equip Data Airlock with cryptography and differential-privacy algorithms to improve its usability in domains including healthcare.
The idea for Data Airlock came as a result of Janis Dalins, a Senior Digital Forensic Examiner at the AFP who had been undertaking his PhD at Monash University, coming across a problem in his research. He needed to give his supervisor access to the data he was analysing.
Dalins approached Data61 to help him develop a solution that gave his supervisor access to the data he was analysing, but also could be used by the AFP going forward.
Expanding on the initiative, Data61 said that in the case of a child exploitation investigation, AFP officers are generally given a couple of days to view a large amount of material before reporting to court for a successful conviction.
“The original method required officers to view thousands of images, comparing photo files to identify similarities,” the organisation explained in a blog post.
In 2018, a method known as perceptual hashing was introduced, which used algorithms to look for similarities between the content of the images, leaving a digital watermark to identify various forms of material.
“Perceptual hashing measures recurring similarities to predict potential outcomes, in this case, if material was created by the same person or included the same people,” Nepal said.
“The old approach lacks predictive analysis. When there’s a minor amount of distortion in the image, such as changing a pixel, that can change the whole hash. So that means the two images, which are perceptually similar, can no longer be detected.”
Data Airlock provides somewhat of an isolated and secure environment where people can put their algorithms and models in, execute them against the data, and get the research out.
“That data never leaves the data owner’s data-isolated environment, so that owner has full control of that data all the time,” Nepal continued.
According to Data61, the design enables researchers to develop new algorithms against sensitive data without being exposed to the data, using a Model-to-Data (MTD) paradigm. The organisation said this keeps information in secure vaults and permits only manually vetted algorithms to operate on the data in isolated “airlock” environments.
“Organisations with sensitive data want to have universities or the researchers involved in analysing their data, but at the same time, they don’t want to release their data to them,” Nepal said.
“We provide an environment where new innovations can be created by engaging with researchers in universities, without them having access to the data.”