Before we get started I though I should give a bit of background information on my motivation to write this article. I recently graduated from my bachelors degree in Computer Science. About half way through my 3rd year I knew that the only area that I wanted to work in was AI. My college didn’t have any AI specific courses and there weren’t many AI internships going around in Pune. I am happy to say that I am now working in an AI research & development team as a graduate intern.
The field of machine learning is becoming more and more mainstream every year. With this growth come many libraries and tools to abstract away some of the most difficult concepts to implement for people starting out.
Most people will say you need a higher level degree in ML to work in the industry. If you love working with data and practical math, then I would say this is not true. None of this information is mind blowing and most of these tips are pretty obvious. However much like eating healthy and exercising, I have found that although everyone knows that this is the stuff you need to do, a lot of people still don’t do it. I am hoping this article will help people set out their own plan for moving into the exciting world of AI.
ML is a really unique field for software graduates and junior developers. The area has really only taken off over the last 5 years and is still relatively young. This provides a real problem/opportunity for both new developers and employers.
Graduates: Don’t have any solid information of what the field is like, few modules in college, hard to get relevant experience.
Employers: Find it very difficult to find people with relevant experience.
This is a tough dilemma to get past as a college student, but it also provides a great opportunity. There is currently a huge deficit in the number of qualified machine learning (ML) developers. Companies across the board are hiring for these roles and can’t fill them. If you can show that you have the relevant expertise, you will be a highly desirable candidate that will stand out not just from the pool of graduates, but experienced hires as well.
This all sounds great in theory, but of course it isn’t that simple. There is a reason these people are hard to find. It is a tough area to become proficient in and the field is growing at a rapid rate with more advances to keep up with every month.
I knew Python already when I started, but, if you don’t, I recommend learning basic and intermediate Python first. The language is pretty easy to learn compared to others. Python is also home to the largest data science/ML community so there are tons of tools to help as you learn.
Learn Python: FreeCodeCamp Python Crash Course
Anaconda & Jupyter Notebook — These are a must for ML & data science. Follow the instructions here to install and set them up.
Visual Studio Code with Python Plugin — I never thought I would be recommending a Microsoft product, but I am honestly impressed with their open source commitment lately. This is now my favorite code editor, even for doing some things in Python — like debugging code.
Kaggle.com is the best place to find datasets when you are starting out. Go ahead and sign up for an account and poke around the site. You will notice that there are many competitions for people of all experience levels and even tutorials to go with them (like this beginner-friendly one about the Titanic). These datasets will be very helpful to practice with while you are learning Python libraries.
Next, it’s important to learn the common Python libraries for working with data: Numpy, Matplotlib, Pandas, Scikit-Learn, etc. I recommend starting with this course from datacamp. It goes over some basics which you can skip or use for review and the Numpy section is a good intro.
Pandas is a must learn but also takes a little while to grasp since it does so many things. It’s built on top of Numpy and is used for cleaning, preparing, and analyzing data. It also has built-in tools for things like visualization. I used a lot of resources to learn Pandas and practice with it. Here are a few:
- Learn Pandas on Kaggle
- Learn Pandas Video Course | Notebook for Course
- Jupyter Notebook Extra Examples: Basics | Plotting with Matplotlib & Pandas | And Many More
After Pandas comes Scikit-Learn. This is where things start to be applied more to actual machine learning algorithms. Scikit-Learn is a scientific Python library for machine learning.
The best resource I found for this so far is the book “Hands on Machine Learning with Scikit-Learn and Tensorflow”. I think it does a very good job of teaching you step-by-step with practical examples. The first half is about Scikit-Learn, so I did that part first and then came back to the Tensorflow portion.
There are many other Python libraries like Keras and PyTorch, but I will get into those later. This is already a lot to learn.
This is the first step into machine learning. Scikit-Learn has shallow learning functions like linear regression built into the library. The Scikit-Learn book that I mention above teaches about many types of common machine learning algorithms and lets you practice with hands on examples.
While that’s good, I still found it useful to also go through Andrew Ng’s Machine Learning course from Stanford.Its available free on coursera.The quality of instruction is amazing and it’s one of the most recommended resources online (it’s not the easiest to get through which is why I recommend it down here).
Start going through the Andrew Ng course slowly and don’t get frustrated if you don’t understand something. I had to put it down and pick it up several times.
Yes, math is necessary. However, I don’t feel like an intense, math-first approach is best way to learn; it’s intimidating for many people. In my opinion, spend most of your time learning practical machine learning and maybe 15–20% studying the math.
I think the first step here is to learn/brush up on statistics. It can be easier to digest and be both a lot fun and practical. After statistics, you will definitely need to learn a bit of linear algebra and some calculus to really know what’s going on in deep learning. This will take some time, but here are some of the resources that I recommend for this.
Statistics Resources:
- I think the statistics courses on Udacity are quite good. You can start with this one and then explore the other ones they offer.
- I loved the book, “Naked Statistics”. It’s full of practical examples and enjoyable to read.
Linear Algebra Resources:
- The book, “Linear Algebra, Step by Step” is excellent. It’s like a high school/college textbook but well written and easy to follow. There are also plenty of exercises for each chapter with answers in the back.
- Essence of Linear Algebra video series — The math explanations by 3blue1brown are amazing. I highly recommend his math content.
- There is an overview of linear algebra in the Andrew Ng course as well but I think the two resources I list above are a bit easier to use for learning the subject.
Calculus Resources:
I had taken a few years of Calculus before, but I still needed to brush up quite a bit.Here are some online resources that helped me as well.
- Essence of Calculus video series
- Understanding Calculus from The Great Courses Plus
Other Helpful Math:
- Mathematical Decision Making from The Great Courses Plus
After learning some math and the basics of data science and machine learning, it’s time to jump into more algorithms and neural networks.
You probably got a taste of deep learning already with some of the resources I mentioned in part 1, but here are some really good resources to introduce you to neural networks anyhow. At least they will be a good review and fill in some gaps for you.
- 3blue1brown’s Series Explaining Neural Networks
- Deeplizard’s Intro to Deep Learning Playlist
While you are working through the Andrew Ng Stanford course, I recommend checking out fast.ai. They have several high quality, practical video courses that can really help to learn and cement these concepts. The first is Practical Deep Learning for Coders and second — just released — is Cutting Edge Deep Learning For Coders, Part 2. I picked up so many things from watching and re-watching some of these videos. Another amazing feature of fast.ai is the community forum; probably one of the most active AI forums online.
I think it’s a good idea to learn a little bit from all three of these libraries. Keras is a good place to start as it’s API is made to be simpler and more intuitive. Right now, I use almost entirely PyTorch, which is my personal favorite, but they all have pro’s and con’s. Thus it’s good to be able to which one to choose in different situations.
Keras
TensorFlow
PyTorch
I have found it very helpful to read current research as I learn. There are plenty of resources that help making complicated concepts, and the math behind them, easier to digest. These papers are also a lot more fun to read then you may realize.
- fast.ai blog
- Distill .pub — Machine Learning Research explained clearly
- Two Minute Papers — Short video breakdowns of AI and other research papers
- Arvix Sanity — More intuitive tool to search through, sort, and save research papers
- Deep Learning Papers Roadmap
- Machine Learning Subreddit — They have ‘what are you reading’ threads discussing research papers
- Arxiv Insights — This channel has some great breakdowns of AI research papers
- The Data Skeptic — They have a lot of good shorter episodes, called [mini]s where they cover machine learning concepts
- Software Engineering Daily Machine Learning
- OCDevel Machine Learning Podcast
Please clap if this was helpful.
If you know of any other resources that are good, or see that I am missing something, please leave links in the comments. Thank you.
Credit: BecomingHuman By: Swanand Katdare