Developers frequently turn to autoencoders to organize data for machine learning algorithms to improve the efficiency and accuracy of algorithms with less effort from data scientists.
Data scientists can add autoencoders as additional tools to applications which require data denoising, nonlinear dimensionality reduction, sequence-to-sequence prediction and feature extraction. Autoencoders have a special advantage over classic machine learning techniques like principal component analysis for dimensionality reduction in that they can represent data as nonlinear representations — and work particularly well in feature extraction.
Until recently, the study of autoencoders had primarily been an academic pursuit, said Nathan White, lead consultant at AIM Consulting. However, there are now many applications where machine learning practitioners should look to autoencoders as their tool of choice. But before diving into the top use cases, here’s a brief look into autoencoder technology.
An autoencoder consists of a pair of deep learning networks, an encoder and decoder. The encoder learns an efficient way of encoding input into a smaller dense representation, called the bottleneck layer. After training, the decoder converts this representation back to the original input.
“The essential principle of an autoencoder is to distill the input into the smallest amount of data necessary to then reconstruct that original input with as little difference as possible between the input and the output,” said Pat Ryan, executive vice president of enterprise architecture at digital tech consultancy SPR.
The value of the autoencoder is that it removes noise from the input signal, leaving only a high-value representation of the input. With this, machine learning algorithms can perform better because the algorithms are able to learn the patterns in the data from a smaller set of a high-value input, Ryan said.
Autoencoders, unsupervised neural networks, are proving useful in machine learning domains with extremely high data dimensionality and nonlinear properties such as video, image or voice applications.
Advantages of autoencoders
One important characteristic of autoencoders is that they can work in an unsupervised manner, which eliminates the need to label the training data, whether by hand or artificially.
Sriram NarasimhanVice president for artificial intelligence and analytics, Cognizant
“[Autoencoders] are unique in that they leverage the benefits of supervised learning without the need for manual annotation, since inputs and outputs of the network are the same,” said Sriram Narasimhan, vice president for artificial intelligence and analytics at IT service firm Cognizant.
A second big advantage is that they can automatically find ways to transform raw media files such as pictures and audio into a form more suitable for machine learning algorithms. MingKuan Liu, senior director of data science for Appen, an AI training data annotation tools provider, said that autoencoders’ ability to glean information from media makes the tool particularly useful for computer vision applications such as feature extraction, synthetic data generation, disentanglement learning and saliency learning.
Data scientists need to consider autoencoders as a complementary tool to other supervised techniques rather than a complete replacement. Supervised machine learning algorithms trained with a large amount of high-quality labeled datasets are still the top choices across almost all industry AI use cases, Liu said.
Top 7 use cases for autoencoders
When used as a proper tool to augment machine learning projects, autoencoders have enormous data cleansing and engineering power.
- Feature extractor
Russ Felker, the CTO of GlobalTranz, a logistics service and freight management provider, said that using autoencoders as a feature extractor removes the need to go through hours of laborious feature engineering after data cleansing. This can allow for data classification to be completed more easily.
“By grouping like items together, you are enabling the system to make fast recommendations on what the output should be,” Felker said.
- Dimensionality reduction
Autoencoders for dimensionality reduction are used to compress the input into the smallest representation possible to reproduce the input with the smallest loss.
“In this case, the goal is not necessarily to reproduce the input, but instead to use the smaller representation from the encoder in other machine learning models,” said Ryan. This is particularly important when the inputs have a nonlinear relationship with each other. However, data scientists should consider other techniques like principal component analysis when the input data has a linear correlation.
“PCA is computationally a cheaper method to reduce dimensionality in case of linear data systems,” Narasimhan said.
- Image compression
Researchers are also starting to explore ways that autoencoders can be used to improve compression ratios for video and images compared to traditional statistical techniques. Narasimhan said researchers are developing special autoencoders that can compress pictures shot at very high resolution in one-quarter or less the size required with traditional compression techniques. In these cases, the focus is on making images appear similar to the human eye for a specific type of content. Pictures of people, buildings or natural environments might all benefit from different autoencoders that can resize and compress large images of that categorization.
- Data encoding
Autoencoders particularly shine at finding better ways of representing raw media data for either searching through this data or writing machine learning algorithms that use this data. In these cases, the output from the bottleneck layer between encoder and decoder is used to represent the raw data for the next algorithm.
For example, autoencoders are used in audio processing to convert raw data into a secondary vector space in a similar manner that word2vec prepares text data from natural language processing algorithms. This can make it easier to locate the occurrence of speech snippets in a large spoken archive without the need for speech-to-text conversation.
- Anomaly detection
Autoencoders used for anomaly detection use the measured loss between the input and the reconstructed output. If, after running a sample through the autoencoder, the error between the input and the output is considered too high, then that sample represents one that the autoencoder cannot reconstruct, which is anomalous to the trained dataset.
Ryan said these kinds of techniques are used in the banking industry to help automate the generation of loan recommendation algorithms. For example, if a bank has a large amount of data about people and loans and can characterize certain loans that met qualifications as good, then this data can be used to characterize what good loans look like. The data from these good loans is used to create the autoencoder. If a data record is passed through the autoencoder, and the measured loss between the original input and the reconstructed output is too high, then this loan application can be flagged for additional review.
“It does not mean that the loan is a bad one to make, just that it is outside of the good loans the bank has seen in the past,” said Ryan.
In some cases, a shipment may be missing some data within the series of transactions used to describe its status. Denoising autoencoders can help determine what is missing based on training data and generate a full picture of the shipment, said Felker. This can improve the performance of other algorithms that use this data for applications like predictive analytics.
In other cases, such as audio or video representation, denoising can reduce the impact of noise like speckles in images or hisses in sound that arose from problems capturing them.
- Fraud detection
It can be challenging to train machine learning models to learn about fraudulent activity, given how small fraudulent transaction counts are relative to the total number of transactions in a company. The versatility of autoencoders allows users to create data projections for representing fraudulent transactions compared to traditional methods, said Tom Shea, founder and CEO of OneStream Software, a corporate performance management software company.
Once trained, autoencoders can generate additional data points and create similar fraudulent transactions, providing a broader data set for machine learning models to learn. Data scientists can also use setup anomaly detection algorithms specific to fraud. Data scientists would train the algorithm using data from legitimate transactions. An alert would be raised when there is a significant difference between the raw data and the reconstructed data.
This is especially helpful in situations where we do not have enough historical samples of fraudulent transactions or when entirely new patterns of fraudulent transactions emerge, Narasimhan said.
Credit: Google News