Data scientist was coined the sexiest job of the 21st century and with good reason. In Linkedin 2020 Emerging Jobs Reports, Artificial intelligence was named the ‘Jobs of Tomorrow’ due to its strong presence. Furthermore, the potential application of data science in multiple industries has attracted people from all backgrounds into this field. Here I present the top 5 most important skills of a data scientist that is essential for their work in data science.
1. Probability & Statistics
Probability and Statistics are two mathematics concepts that are closely related. You cannot fully understand one without the other and they go hand-in-hand to equip you with the techniques to work with data. Knowing that there is no data scientist without data, these two skills form your most fundamental prerequisite.
Some of the relevant concepts you should be familiar with;
- Random Variables
- Basic and Conditional Probability
- Probability Distribution
- Sampling Methods
- Measure of Central Tendency, Variability & Confidence Interval
- Hypothesis Testing
- Central Limit Theorem
- Experimental Design
2. Calculus & Linear Algebra
Two more mathematical concepts that are indispensable for a professional data scientist. Calculus and linear algebra are the backbone of most, if not all, machine learning algorithms. Hence, strong technical expertise in both concepts is necessary to understand these algorithms. Having said that, a general understanding of these might be sufficient as libraries that do these mathematical operations under the hoods are available.
Again, some of the more relevant concepts for data science;
- Uni-variate and Multi-variate Calculus
- Derivative and Integration
- Vector Space
- Dot Product
Arguably the most important skill of a data scientist. Besides having the knowledge to work with data, data scientists need to have the tools and skills to convert their theoretical knowledge into practical implementation. This is commonly done using some form of programming and hence, programming became one of the highly-sought-after skills in a data scientist.
To start off, I highly recommend learning Python as your first programming language. Python is easy to read, write, understand and have the most comprehensive supports for data analytics work. You will almost never go wrong choosing Python as your main programming language.
Another popular programming language for data science is R. R is widely use by statisticians for data analysis, however, it is not a general-purpose programming language like Python.
Regardless of the language, below are some of the programming techniques you need to know;
- Basic syntax, Functions, I/O
- Flow control statement
- Object-oriented Programming (OOP)
- Libraries for handling data such as numpy and pandas for Python
- Regular Expression
- Documentation (Both reading and writing)
4. Data Visualization
Data scientist uses visualization for two main purposes; Exploration and Storytelling. In terms of data exploration, visualization proved to be a great tool to get quick insights from your data. Data scientists then decide how to test or pre-process the data depending on the insights obtained. As for data storytelling, visualization is able to convert thousands or millions of rows of data into simple-to-digest forms for your audience. These two benefits alone make visualization a great addition to your data science toolkit.
Concepts to master visualization,
- Common Chart Types (E.g. Bar, Scatter, Line, Histogram)
- Advanced data visualization (E.g. Heatmap, Map, Word cloud)
- Use of color
- Data visualization tools (Power BI, Tableau, Libraries matplotlib/seaborn for Python, ggplot for R)
- Data-ink ratio
5. Machine Learning
Wikipedia defined machine learning as ‘The scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead’. This definition has perfectly conveyed the complexity and beauty of machine learning.
In my opinion, machine learning has single-handedly pushed the advancement in data analytics and artificial intelligence. In addition, machine learning is most likely the reason this blog exists; to help the huge influx of learners that came into this field following the hype. I say this with a positive tone as we sincerely believed that everyone should have some knowledge of data science regardless of their field of expertise. This is so as machine learning provides the means to transform an industry and our perspective of the industry.
All the excitement seems to be arise from machine learning, however, I strongly suggest building up your fundamentals before dipping into machine learning.
Some algorithms to get you started:
- Linear model (Linear regression & Logistic regression)
- Support Vector Machine (SVM)
- Decision Trees
- Neural Networks
This is it. The five most important skills of a professional data scientist explained in a blog post.
Originally posted here.