In our current age of digitization and AI, data is growing at an astonishing rate. Everyone agrees. And everyone expects AI to have a dramatic impact on our society and workplaces. Well, AI is already having a major impact across a range of real-world scenarios. But these scenarios are relatively narrow, limiting the impact of AI to a selected set of use cases.
When we scale up and start to use AI in more comprehensive applications, for example, to support everyday decisions and assess complex data, AI implementations fail. This is not a new scenario, it’s very similar to other IT projects like ERP and CRM, for example. When you start to scale up the size and complexity of the problem you’re dealing with, your problems also grow exponentially.
We also seem to agree that data is key to resolve these issues. The term “data is the new corporate currency” is thrown around regularly nowadays. So, how are we going to capitalize on this data? The obvious answer is to “mine” our data.
However, I would say we need to “mind” our data to get the best data models (and results) possible.
In fact, the data model is the one element consistently underestimated by today’s enterprises. If you ever want to truly benefit from AI in your organization, you must understand the data models used in your company.
This is an oversight that crops up time and time again. On a regular basis, I meet all sorts of IT professionals. And it baffles me that none of these IT professionals can confidently raise their hand when I ask these simple questions:
- Do you have a data model for your organization?
- And what is a data model anyway?
Let’s start with the last question, first. According to Wikipedia:
“A data model (or datamodel) is an abstract model that organizes elements of data and standardizes how they relate to one another and to properties of the real-world entities.”
1. Top 5 Open-Source Machine Learning Recommender System Projects With Resources
2. Deep Learning in Self-Driving Cars
3. Generalization Technique for ML models
4. Why You Should Ditch Your In-House Training Data Tools (And Avoid Building Your Own)
OK, in defence of every IT professional out there, creating a data model is a pretty challenging task. The size and complexity of your systems, both on and off the cloud, are all conspiring against you.
For any type of analytics project, basic or advanced, understanding your data structures is just the starting point. To create accurate decision support processes and systems, you require a deep understanding of these data structures and how your business processes are reflected in the data.
But data is now a big and difficult beast to tame. The activities of any Data Scientist are split roughly into 90% data and 10% science tasks. Extracting, cleaning, combining and transforming data is the most resource-intensive challenge in many AI (and BI) initiatives.
For me, this is one of the big paradoxes of data science. On the one hand, we evangelize the gospel of AI and, on the other hand, we seem to forget the fundamentals of good data management.
For any kind of advisor or consultant, your data is the starting point — that’s whether you’re handing out a simple piece of advice or planning a major digital transformation.
So, when embarking on any data science initiative, the first question to ask is: “Can you show me your data model?”