How can we cluster images, even WITHOUT labels? And not based on colors, but based on the object, the HIGHER level features.
This is one hell of research, ‘learning to classify images without labels’ and the key thing is to do self-supervised learning.
The three steps are self-supervised, clustering (a bit like KNN), and self-labeling. How can we do this?
How can we classify, even when we DO not know the classes? If we have some kind of dataset, and we cluster the data points.
But the thing is we need to do this without the label information, and we are going to be as generous as possible. We have to cluster the data, via actual labeling.
Feature learning and classification are decoupled! Wow, that is their method, there is their secrete.
1. AI for CFD: Intro (part 1)
2. Using Artificial Intelligence to detect COVID-19
3. Real vs Fake Tweet Detection using a BERT Transformer Model in few lines of code
4. Machine Learning System Design
What distance should we use? What kind of metric BASED upon the clustering?
Via CNN, use the last layer features, and what kind of network should we use? We can use such as Deep Cluster, but these are older works and not the main method they used.
The method seems to involve, CNN acting as a prior, and learning a good representation.
If we do this, the algorithm will focus on the lower level features such as colors and this is the general results we DO see! How can we take this into higher dimensional space? What can be done to do this?
Vary on initialization and not that robust.
The solution was representation learning!
The T() is a transformation, and it seems like minimizing the distance between the original image, and the image that had some kind of transformation.
Such as mutual information, and this makes a lot of sense since MI we are playing with higher dimensional features.
This is a good way for pre-training.
We are going to DESTROY the lower-level information, this is SO fucking smart! Not depended on the pixel intensity.
Minimize the distance in the embedding space, but what kind of distance should we use? And we are going to use this as a pre-training method.
Then another network is trained, here the input is one image and another input is the neighborhood set from the embedding network.
This is complicated, but such amazing work!
Self-surprised training, here the goal is to find a CREATIVE way to train the model, we are trying to overcome the lack of data.
There is also ‘soft assign’ done as well, created a histogram of different classes. Matching the distribution, between different numbers of classes.
Maximize the entropy, this is to get uniform distribution, to prevent the clustering everything into one cluster. This is to avoid the degeneration solution.
There are some mathematical reasons why the authors have decided to formulate this problem in such way.
The nearest neighbors are already good at finding similar images via higher-level features.
What is SELF-LABELING? Isn’t that bad? NO!
The green line indicates the linear separation line, for the final classifier, this is a HUGE algorithm there is quite a lot of steps to consider.
But it shows how multiple models combined we are able to achieve SUCH a great results.
This means that model size is NOT the name of the game (of course it does play a huge part) but creativity as well as how you formulate the problem (in a way to loss function) plays a much important role.
If we have HUGE data, we are able to get much higher classification accuracy. And there are so many tricks in the training step.
Removing or using confidence samples, give interesting results. What about the image net?
And it does well on the image net as well! A very good way to scale!
There is a drop of accuracy if we over-cluster because in the real world we do not know how many clusters we have before working. There is going to be a drop.