I often see articles or posts that identify data integration or preparation as the key issues facing data science projects. This always puzzles me as this is not our lived experience – not what we see when we work with Fortune 500 companies adopting predictive analytics, machine learning or AI. But I think I have figured it out. The problem is as follows:
What data scientists think counts as a “data science project”
is not, in fact, a data science project.
Let me illustrate this with some data from a great study. Back in 2016, the Economist Information Unit did a survey on “Broken links: Why analytics investments have yet to pay off” and below you see how this data appears to support the argument that data problems are #1.
Wow – pretty clear that Data integration/preparation is the biggest problem with nearly twice as many projects reporting it as a problem as the next one.
In fact, though, this is a subset of the data from the survey. Here’s the full data set:
Data integration and preparation only ranks #4. Problem definition/framing, Solution approach/design and Action/change management all rank higher. This is our experience.
In large, established “grown-up” companies, data science projects fail for one or both of two reasons:
- They are solving the wrong problem. They are building an analytic that is not what the business need, that will not solve a true business problem or that is poorly designed to fit into the business context.
- They cannot action the model they build. They can’t change the business decision making to take advantage of the analytic by changing the decisions made and actions taken.
And this illustrates the problem.
The problem is that data scientists THINK their project starts with data and ends with the communication of their analysis. If that’s your focus, then data is your #1 problem.
But this is not where data science projects start nor where they end. They have to start and end with the business. That means starting with a business problem – a business decision that the business wants to improve – and ending with that problem being solved – the business behaves differently (better). If that’s your focus, then your problem is not data but problem definition and operationalization – making the analytic work IRL.
Here’s the difference, shown on those phases. On the left, what many data scientist think their projects involved and on the right, what it really involves.
Bottom line: If your data science team is telling you that
data is their #1 problem then they’re doing it wrong
I’ve written about this before – check out this LinkedIn article on the study itself and this one on adopting decision modeling as a better way to define the problems your data science team is trying to solve. You might also like our recent white paper and videos on Building an Analytic Enterprise.
Feel free to connect with me on LinkedIn to message me with questions and comments. And if we can help your data science team start working on a better definition of a project, we’d love to.
This article was originally posted to LinkedIn where it has 60+ comments, 140+ reactions and 1,000+ views. Check out the discussion.