Before you actually deep dive into the Data Science field it is really helpful to have an idea of different areas of it.
If you are someone trying to get into the Data Science domain, you might have been confused about different areas of it, how they are related and whether they are the same. It is highly common in the Data Science field for people to use a lot of terms interchangeably and it is because most of these terms are interrelated. This is a high- level article where I would be talking about different areas of Data Science.
Why is Data Science so important? With most of the world running on computers these days tremendous amounts of data is being generated. Why does it matter?
To make it more obvious let us consider an example of a restaurant. A restaurant has to make sure that they cook just enough food for that day, if it cooks more than what they need food is going to be wasted and so the cost invested in it, if they cook less food then they not only miss out on profits but it might degrade their fame. In a traditional world, a restaurant can do a trial and error method to figure out the optimal quantity for each item they serve. Here they are generating data points in their trial and error period to understand the optimal quantities. But without data Science it is going to be very cumbersome to manually predict the optimal quantities. This simple thing here is one use case of Data Science and you can call it as demand forecasting.
My point here is that with data businesses can make better and informed decisions about various things like ideal place to start the next branch, most ROI features etc. Data Science is that special tool that makes businesses more profitable and so businesses are investing so much into it.
Let’s see a brief overview of important areas in Data Science:
Data Science (DS):
As the name represents it is the science of data, it consists of techniques to extract information from data. The applications of Data Science are so vast. It consists of data collection, data cleaning, data analysis, data mining and data modeling etc. Name anything that involves dealing with data and it comes under this domain. It uses tools like Machine Learning, Statistics, Linear Algebra.
2. How AI Will Power the Next Wave of Healthcare Innovation?
3. Machine Learning by Using Regression Model
4. Top Data Science Platforms in 2021 Other than Kaggle
Data Science does not only take account of building the models but also identifying the areas that could be benefited by models, testing the hypothesis about models and also analyzing the impact of models in production by using A/B testing. In short terms, it also addresses the pre model building and post model building phases.
Artificial Intelligence (AI):
AI is an intelligence that is generated rather than acquired. Most people say that AI is the intelligence demonstrated by machines but I think we are moving towards the future where humans might also use intelligence generated in labs. I am talking about implanting chips in our brains here (do you implant chips if you are given a chance? I would love to read your thoughts in the comments section). My point here is AI doesn’t have to be specifically machine intelligence. Just so you know, there is this test called the Turing Test proposed by the great Alan Turing. Turing Test checks if a computer possesses AI by assessing if it can mimic a human under specific conditions.
Machine Learning (ML):
Machine Learning is a subset of AI and a superset of Deep Learning. Machine Learning makes a machine an expert in a specific problem by providing a lot of examples. We humans are capable of saying the possibility of rain given environmental factors like is it cloudy, is it warm etc. We are only able to do it because we have seen this process so many times before(examples). But we are limited in deriving the relationships between a few number of factors. That’s why we need ML, it is capable of predicting a dependent variable given “n” independent variables. In our example the dependent variable is “Is it going to rain?” and the information we used “is it cloudy?”, “is it warm?” etc are the independent variables, called as features in the ML world.
The idea here is to enable the machine to come up with a mathematical formula automatically that encapsulates the relationship trends between independent and dependent variables in order to find the dependent variable given independent variables. It is done by showing tons of examples of relationships between dependent and independent variables. If you have a question why do people say you need to have a good understanding of Mathematics to learn ML and DL, this is why. We use Linear Algebra and Statistics at different points of Model Building. Linear Algebra plays a prominent role in Feature Engineering space, Statistics is used in the core of Model Building and also in the phases like Model Selection.
Deep Learning (DL):
An extension of ML that uses multi-layer neural networks to train machines. The main difference between ML and DL is that we don’t have to do as much feature engineering as we do in ML. The depth of neural networks we use here could take care of identifying useful features intrinsically. This is a far more advanced topic from ML and it can do miracles if you have enough data. Almost all advanced applications of AI are built with Deep Learning, for example Self Driving cars, Voice Assistants, Facial Recognition etc.
We have so many pre-trained models available these days so you don’t have to build a model from scratch but fine-tune your use case on top of pre-trained model. For example, if you want to build a voice assistant for customer care, you can take the pre-trained model closest to your use case and then use the data you have to fine-tune it for your purpose rather than building it from scratch.
Computer Vision (CV):
Computer Vision is similar to building the eyes of machines. Every problem that involves images and videos comes under Computer Vision. Some problems that could be solved using CV are Image classification, object detection, segmentation, gesture recognition etc. Computer Vision can leverage both Machine Learning and Deep Learning but I prefer Deep Learning over Machine Learning.
Natural Language Processing (NLP):
You can think of this area as Ears and Mouth of Machine. You can make machines interpret what is going on in audios and texts using this wonderful tool. Some of the problems that you can solve using Natural Language Processing are Speech to Text, Text to Speech, Text Extraction, Named Entity Recognition, Text Summarization, Text Classification, Text Generation etc. You would use very different techniques to solve each of these problems. Like Computer Vision you could use Machine Learning or Deep Learning to solve this problem but I prefer Deep Learning.
When not to use:
It is not ideal to use Machine Learning or Deep Learning to solve every problem. Implementing a ML or DL algorithm is not very simple, it takes so much time and effort that goes into generating a good dataset, cleaning and preparing the data, training the model, testing the model etc. It is important to understand when and when not to use ML.
One rule of thumb I use is, is the problem simple enough to solve without using ML. For example, look at the problem of extracting age value from a form. You can definitely use DL techniques like Named Entity Recognition to solve this problem but you could also use regex to extract the age if you know your form asks the same question in the same format.