Monday, April 19, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Machine Learning

Interview With Kaggle Master Bac Nguyen Xuan

July 13, 2020
in Machine Learning
Interview With Kaggle Master Bac Nguyen Xuan
586
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

“To be at the top, one has to be aggressive, hardworking and creative.”

Bac Nguyen Xuan

You might also like

Machine Learning Helps Optimize Therapeutic Antibodies

Researchers at MIT DAI Lab Have Recently Built Cardea: A Machine Learning Framework That Turns Health Care Data Into Insights

Automating Drug Discovery With Machine Learning

For this week’s ML practitioner’s series, Analytics India magazine got in touch with Bac Nguyen Xuan, a Kaggle master who is currently ranked 56th in the world. In this interview, Bac talks about the tricks behind his Kaggle success.

On His Initial Days

Bac has a Masters in Computer Science from Chonnam National University in South Korea, where he picked up machine learning and fell in love with computer vision, especially with medical image processing. Before starting his journey in machine learning, Bac had a brief stint as an embedded software engineer in Japan. While visiting one of the tech exhibitions in Japan, he was fascinated by the presentations of self-driving cars, robot playing table tennis and other technologies. This was when he decided to quit his job as a software engineer and pursue ML.

Bac currently works as an AI Research Engineer at VinAI, where he and his team conduct high-impact research that pushes the knowledge frontier in AI and to accelerate applications of AI in Vietnam, the Asia Pacific region, and beyond.



Bac strengthened the theory part by attending online courses of Andrew Ng on Coursera and by reading Deep Learning by Goodfellow, along with Machine Learning Coban (translates to The basics of machine learning).

“I asked my brother, who is also a Kaggler on how to become an ML Engineer. He told me to join the Kaggle and practice. It is a great place to not only practice but also to learn new things that have not been mentioned in any books. Then, I set my target to beat him first. Thus, my journey started,” says Bac


W3Schools


On Kaggle Success

Upon joining Kaggle, Bac skimmed through every single kernel, discussion forum, top-solution from the previous competitions to recognise the patterns of winning approaches.

He supplemented this with reading advanced research papers from top-tier conferences like CVPR, ICCV, ECCV, and has tried to implement them in Kaggle competitions. “Even though I know they might not be helpful for the competition too much, they helped me to widen my knowledge, sharpen my skills. Who knows they will be useful for the next competitions. Learn and apply everything you can,” says Bac, with great optimism.

Bac fetched his first gold medal in the Google Quick Draw competition. “In this competition, you are given large-scale data. Your mission is to predict 340 classes of sketching. There are 112200 test samples; let do a simple calculation: 112200 / 340 = 330 and remain 0. It was likely that there were 330 samples/class. We can verify it by ‘Leaderboard probing.’ If there are 330 samples/class, the leaderboard score will be: 330 * (1 + 2/3 + 1/3) / 112200 = 0.005. 

“I received exactly the score from the leaderboard. Thus, the assumption that there are 330 samples/class is correct. To leverage this assumption, we should do “Probability Calibration’. The lesson learnt is to understand evaluation metrics and hack the way organisers split the leaderboard!” says Bac.

Bac used the same trick in the IEEE’s camera model identification competition, which landed him in the top-10 on the leaderboard.

In the case of Recursion Cellular Image Classification, where the contestants were provided with the images of several drug experiments, each with 1108 classes, and each class appears only once. Bac applied the same probability calibration and achieved a significant improvement of 10%. So, Bac insists that to be more successful in Kaggle, one has to be more creative.

“You can easily get lost when you are a beginner. If you want to go further, you should go together.”

Underlining the importance of having the right team, Bac remembered how unproductive his initial attempts were due to lack of direction and how having a teammate helped him progress. “I did not hesitate to contact the potential persons when I saw them at the top-50 on the leaderboard to get a hope that they would give me a chance to work together,” says Bac.

He also says that the accomplishments in Kaggle extend far beyond the leaderboard. Before Kaggle, he was told that he wouldn’t make the right fit for an ML job. Today, Bac is flooded with job invitations thanks to Kaggle.

The experience of Kaggle has also helped in the current research position that he holds. The preparation and participation have acquainted him with all the state-of-the-art techniques, which in turn, have proven to be quite handy at his workplace. 

See Also


“I like the clean code. It is the key to get other members to catch up with your work.”

To construct the baseline, Bac uses 1080Tiand a 2x2080Ti to run code. When it comes to frameworks, he prefers PyTorch especially, Catalyst, which he says helps in collaborating with the team. When dealing with tabular data, Bac prefers lightgbm and rapidsai for speed and efficiency.

Tips for finding a winning solution for beginners from Bac:

  • Exploratory Data Analysis. Simple visualisation can help have a good feeling of the data and problem.
  • Understand the evaluation metric.
  • Be careful with the evaluation metric. Ex: Log Loss is very sensitive with distribution. It might yield a great shakeup of the leaderboard.
  • Understand the leaderboard proportions, which can help in good CV
  • Experiment fast to increase your chances of winning.

To be at the top, Bac advises one to be aggressive and hard working. “I believe that to be at the top of Kaggle, you should work as a full-time job. Read and try all the discussions/ideas from not only current competition and previous similar competitions. The last two weeks are very important. There are a lot of changes in the leaderboard. Other teams can merge, and that will take a toll on your ranking, which can be demotivating,” warns Bac.

Final Thoughts

Talking about the unprecedented attention towards ML across the globe, Bac admits that there is a widespread misconception that deep learning should be used to solve every problem. 

That said, Bac believes that the ongoing pandemic will fuel more investments towards machine learning-based application in healthcare. However, one of the biggest challenges in medicine is the lack of data. “It is not easy to publicize the dataset due to privacy. Thus, federated learning can be a possible solution along with AutoML,” speculates Bac.

Added to this is the ever-evolving machine learning communities such as Kaggle, which are creating more awareness across the globe. The democratisation of ML is fueled by the incentivisation of problem solving through competitions and will only continue to in the coming years.

For beginners, who are keen to be part of this journey, here are some additional tips from Bac:

  • Focus on only one competition at this time. If you are doing many at the same time, probably, you learn nothing, and you will easily get tired.
  • Don’t skip any ideas coming from the discussions. Sometimes it is not useful for others, but this idea might help you a lot.
  • Don’t let your GPU sleep.
  • If you come across a great method from top-solution, try it in the next competitions.

Provide your comments below

comments


Credit: Google News

Previous Post

Google: Mitigating disinformation and foreign influence through social media a joint effort

Next Post

Overcoming an Imbalanced Dataset using Oversampling.

Related Posts

Machine Learning Helps Optimize Therapeutic Antibodies
Machine Learning

Machine Learning Helps Optimize Therapeutic Antibodies

April 18, 2021
Researchers at MIT DAI Lab Have Recently Built Cardea: A Machine Learning Framework That Turns Health Care Data Into Insights
Machine Learning

Researchers at MIT DAI Lab Have Recently Built Cardea: A Machine Learning Framework That Turns Health Care Data Into Insights

April 18, 2021
Automating Drug Discovery With Machine Learning
Machine Learning

Automating Drug Discovery With Machine Learning

April 18, 2021
Twitter aims to fight bias by examining its own machine learning algorithms
Machine Learning

Twitter aims to fight bias by examining its own machine learning algorithms

April 18, 2021
Make Machine Learning Interpretable with Shapash
Machine Learning

Make Machine Learning Interpretable with Shapash

April 18, 2021
Next Post
Overcoming an Imbalanced Dataset using Oversampling.

Overcoming an Imbalanced Dataset using Oversampling.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Machine Learning Helps Optimize Therapeutic Antibodies
Machine Learning

Machine Learning Helps Optimize Therapeutic Antibodies

April 18, 2021
Researchers at MIT DAI Lab Have Recently Built Cardea: A Machine Learning Framework That Turns Health Care Data Into Insights
Machine Learning

Researchers at MIT DAI Lab Have Recently Built Cardea: A Machine Learning Framework That Turns Health Care Data Into Insights

April 18, 2021
Automating Drug Discovery With Machine Learning
Machine Learning

Automating Drug Discovery With Machine Learning

April 18, 2021
Twitter aims to fight bias by examining its own machine learning algorithms
Machine Learning

Twitter aims to fight bias by examining its own machine learning algorithms

April 18, 2021
Make Machine Learning Interpretable with Shapash
Machine Learning

Make Machine Learning Interpretable with Shapash

April 18, 2021
Why the Patent Classification System Needs an Update
Machine Learning

Why the Patent Classification System Needs an Update

April 18, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Machine Learning Helps Optimize Therapeutic Antibodies April 18, 2021
  • Researchers at MIT DAI Lab Have Recently Built Cardea: A Machine Learning Framework That Turns Health Care Data Into Insights April 18, 2021
  • Automating Drug Discovery With Machine Learning April 18, 2021
  • Twitter aims to fight bias by examining its own machine learning algorithms April 18, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates