OR: What skills data scientists need to focus on to stand out.
Competitions and hackathons have their role to teach you AI modeling skills. However, as shown above, the problem becomes clear:
Competitions only teach a subset of skills that are important in the real world while they pose the risk of creating misleading incentives, e.g. of winning against someone instead of working together, expecting unrealistic data quality, or spending time on tasks which are esentially not very useful to solve a given problem. All of which can be toxic personality traits in an actual work environment.
Contrary, there are several skills that matter on the job apart from the fact the data in the real world is in almost all cases far from “perfect” (as suggested in most competitions).
To get a better understanding of what really matters in actual data science roles, here is how a Redditor describes his day to day work:
- Meeting with business to understand the problem
- Find the data and build data pipelines using SQL/Python
- Do analysis and build baseline model in Python/Jupyter notebook
- Once a workflow is established, I put everything in Python scripts, and run automated hyperparameter/model selection/etc. searches and standardized result output to find the best model. Also helps with reproducibility.
- Present and communicate results to business
- Develop final model package and data pipelines to deploy it in our production platforms (using OOP concepts like python classes, software engineering principles like pylint, pytest, CICD pipelines, etc.)
Another Redditor comments on the above list as follows:
The list seems like a solid workflow, which pretty much guarantees job security.
Looking at the above sequence of steps, most of it is data engineering, reproducible analysis development, and then production engineering.
This connects to a recent mind-boggling statistics in a KDNuggets article where ML researcher Mihail Eric analyzed the data roles being hired for at every company coming out of Y-Combinator since 2012. Here’s the gist of what he found out:
There are 70% more open roles at companies in data engineering as compared to data science. As we train the next generation of data and machine learning practitioners, let’s place more emphasis on engineering skills.
The “secret” to successful data science teams is data quality (and engineering) and integrating domain knowledge.
In summary, the real value in a data scientist is not just the languages, models, and tools they use. Instead, what organizations are looking for is someone who knows how to solve a problem using data; starting from figuring out what data to collect in the first place all the way to turn that into a simple insight.
Now, let us talk about what skills really bring you ahead to get a job, excel at work, and most importantly enjoy what you are doing.
First, I want to focus on soft skills as they are mostly overlooked in competitions and hackathons. Technical skills will follow.
The people skills.
According to WEF two of 2025´s most important skills are analytical thinking as well as complex problem-solving. Let us apply this to the data science field and describe the most essential skills.
Here are two myths about how data scientists solve problems: one is that the problem naturally exists, hence the challenge for a data scientist is to use an algorithm and put it into production. Another myth considers data scientists always try leveraging the most advanced algorithms, the fancier model equals a better solution.
The reality is each problem is unique and comes with different parameters. The essential skill is to figure out the most effective and often efficient approach to solve the problem. Sometimes it needs a fancier model but more often a simplistic approach yields better results. The skill is that you deeply analyze the problem, understand it, and then decide what solution can be built. Problem first, technology second!
1. Top 5 Open-Source Machine Learning Recommender System Projects With Resources
2. Deep Learning in Self-Driving Cars
3. Generalization Technique for ML models
4. Why You Should Ditch Your In-House Training Data Tools (And Avoid Building Your Own)
Critical Thinking & Analysis
The times of heavy top-down management are (mostly) over. While a competition has a more top-down style process to follow, building a real-world project works best in a collaborative approach with a flat hierarchy.
The best data scientists do not just follow orders but learn how to think independently. This will not only help to address a problem differently but also will improve team communication, educate the business leaders, and the overall leadership of an organization. All of which ties into the following skill-set:
Communication & Collaboration
A data scientist has to be able to communicate results and automate analyses. While from a technical standpoint, this is typically done in Power BI, Tableau, or similar, direct team communication is key. This means to:
- Build empathy and cultural awareness
- Understand how to ask for help the right way
- Split the work amongst each other most effectively
- And much more where competitions won´t help you.
“Communicate unto the other person that which you would want him to communicate unto you if your positions were reversed.” — Plato
Active Learning & Learning Strategies
A no-brainer. Competitions lack the collaboration and interaction necessary to learn the most, either by teaching or listening to somebody else.
Leadership & Social Influence
Individuals who are able to take responsibility and drive initiatives forward in a team are what every organization is looking for. Taking over responsibility is like a muscle that can be trained. You do not always need to be a senior to take on leadership roles. Not the project size matters but your mindset of moving things forward wherever you can.
You don’t need a title to be a leader.
“The strength of the team is each individual member. The strength of each member is the team.” — Phil Jackson
Lastly, fun is an essential “soft skill” to have. As AI influencer Eriber Weber puts it:
“Do not only optimize for income but for work that makes you happy.”
Happiness is the ultimate productivity driver but apart from that, work (life) is too short to do mostly stuff you do not enjoy or that does not serve a bigger purpose.
Here is how one of our Omdena project participants Samir, Software Engineer at Google, describes the joy of collaborating with 50 engineers from around the world:
“A group of strangers from different corners of the Earth, who have never met each other; transcending geographical borders and time zones to work together and solve fascinating social problems; whilst learning from and inspiring each other every single day! This isn’t just a figment of my imagination. Such a world exists and I am extremely grateful that I am part of such an extraordinary journey.”
Now, after touching on the key soft skills, I want to briefly talk about some overlooked hard skills. Apart from programming, ML, and EDA, there are some less obvious skills that make or break it.
Coming back to our example earlier where a senior data scientist and Redditor describes his work, most of it covered data engineering.
As a data scientist, you may join thinking you’re there to build smart models and derive as much value from the data as possible. In reality, most of the time you get held up as your first few months require you to build the necessary infrastructure and pipelines to even get the data. Having looked into some messy datasets will help you to kick-start your career.
Visualization & Analytics
Almost always visualization is ignored by beginners and even more experienced data scientists.
Here is why visualization is so important:
It can provide you some great help in:
- Interpreting data better and memorable.
- Getting your insights across (non-technical) folks
- Noticing correlations
- Figuring outliers
- Finding Cause-Effect relations
- And more you won´t see till you visualize it 🙂
Is the ability to manage the change and configuration of an application. It’s a priceless skill in a team of developers. It allows you to check files for modifications. Next, during check-ins, you see if the files have been changed by another user and you will be alerted and able to merge them.
Paying attention to version control will make the teamwork much more effective.
API’s and Command Line
You just can’t skip it — if you do, you are bound to hear it again. APIs are being used almost everywhere are needed to excel in developing applications in any of the data science domains such as cloud, IoT, and web applications. Having a good understanding of different storage services, security features, and automation tools will enable you to apply the best technology needed for the job.
Your model is not a Jupyter notebook!
The deployment on edge and/or cloud is a must-skill in all production applications. As a fact, to maintain a model on production with security, and maintenance is one of the rare and wanted skills in the field now.
Nor do competitions often result in deployment, neither do they teach you how to deploy real-world models.
AutoML tools don’t solve every problem. You need stable, clean data for these to even be tenable. You also need someone that understands the problem enough to select useful validation steps and metrics. Often you need those metrics/validations tooled to be intuitive or explained somehow so some business-side person can even understand the results at all.
So, as the skills requirements are increasing, it’s becoming very hard for people to get into this field, which, in turn, creates a shortage within the market.
It requires Hardwork+Patience+Fun.
There is so much to learn and one needs to be willing to do the hard work. This requires consistent efforts over time and people aren’t patient. 🙂