There are more images in this world, outside from natural images. This is another research topic, how can a model understand advertising? The thing is these images might be much easier to classify. BUT it is MUCH harder to understand since the model need to understand both objects as well as culture.
This is really hard, since decoding the information is not an easy thing. Such as women who bought some products that killed an animal. These are just few examples.
So a new dataset is created, but there is going to be a HUGE bias for this dataset. Most of the advertisement is for north American market, this is okay. But even on advertising, there are going to be biased, such as racism.
The model have to answer the above example question, that is HARD. This problem is answering the question from high dimension to HIGH DIMENSION! Not just classification, into a one-dimensional vector. This is REALLY REALLY hard.
The above example is not really closely related, it is more or less emotional detection from faces.
For the advertisement, we need a new decoding method, some of the related work here is the predicting click-through rate but they use lower dimensional data.
So this is the dataset, and each image has a different label, depending on this we would be able to solve certain problems. Since the annotation is really what limits the problem to be solved unless there is some self-learning method.
This is not an EASY task, there are so many levels for the model to understand…it will be solved one day. But NLP + Computer vision have to come together.
So the first thing to ask the model is, “Can we understand this ad only via visual concepts?” this is the right question to ask since if this is true. We just only might need object class as well as location.
But the understanding symbol is hella different problem and SUPER hard.
Even within the straight forward ads, there are many different advertisements in the distribution.
And many different YouTube dataset was collected, this paper’s MAIN contribution are these advertisements.
A lot of advertisements are amusing, interesting….. since YouTube is more or less for entertainment. This does make sense.
Yeah, this is actually a good result. Considering how hard this problem is. Combined with the symbol they are able to 50 percent, which is impressive!