This fully convolutional network is able to perform voice style transfer, this is a more or less similar process as style transfer, but in audio.
1. AI for CFD: Intro (part 1)
2.Using Artificial Intelligence to detect COVID-19
Real vs Fake Tweet Detection using a BERT Transformer Model in few lines of code
Machine Learning System Design
This is super cool, knowing that the results were produced by zero-shot transfer, we can say that it is pretty impressive. I was a bit disappointed in all honesty, but I still, appreciate this research.
There is really a lot of research in this voice transfer, and it seems like zero-shot is a very powerful method to make this happen.
There are quite a lot of encoders, these can be thought of as functions in which maps the high dimensional data into lower dimension.
And all of these audio problems, are usually solved by mel spectrogram rather than directly working with audio signals.
And the author’s method is made up of only CNN, this is very good since it is able to be applied to different audio sizes.
They used cosine similarity to measure how different people are close to one another. Interesting.
And using the WaveGlow they were able to convert the mel spectrogram to audio.
It seems like there is a lack of diversity when it comes to the audio dataset. This is not a good thing, considering that this area of research has huge potential.
Their method is super fast, one of the fastest, but the quality might not be the best. Still very good!