Feature Engineering is one of the core techniques that can be used to increase the chances of success in solving Machine Learning problems. As a part of feature engineering, Feature Learning (also called Representation Learning) is a technique that can be used to derive new features in your dataset.
Existing features are sometimes not good enough and having the right input features is an important prerequisite for training high-quality ml models. Hence, feature engineering is used to transform sets of inputs into new, more useful inputs to solve a particular problem.
Supervised Approach:
New features are learned using data that has already been labeled.
Examples:
- Datasets that have multiple categorical features with high cardinality
- Image classification
Unsupervised Approach:
Based on learning the new features without having labeled input data. With this approach, labels are actually not needed in the existing data because new features can be learned using the following algorithms and used as inputs for model training.
The typical form of feature leaning is Clustering. A cluster identifier can act as a new feature when assigned to a bunch of rows when splitting the data into different clusters.
Other algorithms:
- Principle Component Analysis
- Independent component Analysis
- Autoencoder
- Matrix Factorization
Image Classification with Convolutional Neural Networks (CNNs):
One of the applications of feature learning is Image Classification where Deep Learning models with their hidden layers are capable of automatically learning features from images.
A convolutional neural network is a special case of a deep learning neural network that has a combination of hidden layers that have some very specific abilities in learning hidden features. A typical convolutional neural network will consist of a stack of two types of hidden layers. One is the convolution layer, the other one is the max-pooling layer. Finally, the output uses a densely connected layer to produce the final classification.
You can use the convolutional layers to:
- Learn local patterns: they are translation invariant i.e, once the model learns to recognize a certain type of local pattern somewhere in the image, it can also recognize it somewhere else in the image. That makes the approach so powerful.
- Down-sample and make Translation-Invariant: the Max-Pooling layer is applied to down-sample the data using a special kernel and results in a representation that is translation invariant. The max-pooling layer is the one that enables such a classification approach to recognize a human face, even if the picture is taken from different angles.
- Densely connect all outputs to learn classification: multiple stacks of convolutional and max-pooling layers applied with a very large number of nodes in those hidden layers are all connected together to form a classification model.
1. Machine Learning Concepts Every Data Scientist Should Know
2. AI for CFD: byteLAKE’s approach (part3)
3. AI Fail: To Popularize and Scale Chatbots, We Need Better Data
4. Top 5 Jupyter Widgets to boost your productivity!
Image Search with Autoencoders:
Another application of feature learning is an Image Search where the same approach is used as CNNs but to search for images.
An autoencoder is a very special type of neural network that is used for unsupervised learning. The most important feature of the autoencoder is that it is trained to reproduce its own inputs as accurately as possible.
It typically starts with a very large number of inputs. Each layer becomes progressively narrower than the previous one until the minimum layer width is reached. Afterward, layers start to have increasingly larger widths until the last one, which must have the same number of outputs as there are inputs in the first layer.
The middle layer has the smallest width and is of particular importance for two reasons:
- It marks the border between the two internal components of the autoencoder — the Encoder: left of that particular layer, and the Decoder: the right of that particular layer.
- This layer produces a Feature Vector i.e, a compressed representation of the input (compression as in dimensionally reduced as opposed to reduced in size).
Autoencoders translate those images into n-dimensional vectors, where n is the width of the middle layer. Then, these vectors can be compared with each other using some distance metric like the well-known Euclidean distance.
Once the autoencoder is trained, the encoder can be used to embed inputs into feature vectors. Finally, apply a distance measure to identify similar input images. If that distance is below a certain threshold, there is a high probability that the new image contains the encoded image(s).
Credit: BecomingHuman By: Ayesha Akhtar