Credit: Data Science Central
Describing the Data Science analytics development process has always been a struggle for me. The corporate world is full of linear processes. Major monolithic enterprise applications – ERP, MRP, SFA, CRM systems – have been architected to reduce operational complexity via a series of interlocking, heavily-engineered (and re-engineered) linear processes (though I suspect that anyone who has survived an SAP ERP implementation would take issue with my “reduce operational complexity” statement).
The data science development process could not be more different. The Data Science development process is marked by the highly non-linear, heavily-iterative data science “rapid exploration, discovery, training, testing, failing and learning” process (see Figure 1).
Figure 1: Data Science Engagement Process
While Figure 1 has been my best attempt to convey the Data Science Development process (“What’s The Difference Between BI Analyst and Data Scientist?”), it doesn’t adequately convey the uber-non-linear, almost chaotic nature of the Data Science development process.
That is until one day when I was digging through my desk drawer in my house looking for a highlighter, I discovered the right analogy…Game Boy® Final Fantasy Legend II (see Figure 2).
Figure 2: Final Fantasy Legend II
Now I cannot personally attest to the other Final Fantasy games, but I can verify from personal experience that having to play and win the Final Fantasy Legend II game may be my closest analogy to the data science development process. But before I jump into “why the data science development process is like playing Final Fantasy Legend II”, let’s review some important data science concepts.
Let’s review some important Data Science concepts that help guide the data science development process.
- Data Science is about identifying those variables and metrics that might yield better predictors of performance. This is an important guiding concept because it highlights the importance of a “natural curiosity” to guide the variable and metric exploration process. Data Science is empowered by the power of “might”; that is, it’s okay (in fact, it’s necessary) to be willing to fail in pursuit of new variables and metrics that “might” be better. And remember, if you don’t have enough “might” moments, you’ll never have any “break-through” moments.
- Machine Learning and Deep Learning seek to codify the patterns, trends, associations and relationships buried in the data. There is much that Machine Learning and Deep Learning can do, but at its most fundamental level, all machine learning and deep learning do is use math to identify patterns, trends, associations and relationships buried in the data, and measure the strength of those patterns, trends, associations and relationships.
- The value of Big Data isn’t in the volume, it’s in the granularity.The most successful applications of data science capture insights (propensities, inclinations, tendencies, relationships, associations) at the level of individuals – whether humans (customers, students, teachers, patients, nurses, engineers, technicians) or devices (compressors, chillers, turbines, motors, gear boxes). Being able to create Analytic Profiles (for humans) and Digital Twins (for devices) enables predicting behaviors that can lead to monetization and automation opportunities. Check out “Cohort Analysis in the Age of Digital Twins” for more details on Analytic Profiles and Digital Twins.
- Reinforcement Learning plays a giant game of hotter-and-colder. Reinforcement Learning, one of my favorite advanced analytics algorithms, uses a simple trial-and-error concept to learn in order to maximize rewards while minimizing costs. Again, an algorithm that fully understands that the only way to learn is to try, fail, learn and try again. Check out “Transforming from Autonomous to Smart: Reinforcement Learning Basics” to learn more about Reinforcement Learning.
- AI is about codifyingcustomer, product, operational or market patterns and relationships in order to learn, act and/or automate. Artificial Intelligence is about creating an analytic environment that continuously learns through every human or device interaction so that subsequent interactions get more effective. Check out “Artificial Intelligence is not “Fake” Intelligence” for more details.
- “It’s tough to make predictions, especially about the future.”Yoga Berra famously came up with this quote and it perfectly sums up the data science challenge; that when predicting the future, it’s hard to know when “good enough” is actually “good enough”. See blog the blog “Real-World Data Science Challenge: When Is ‘Good Enough’ Actually ‘Good Enough’” more details on the challenges associated with Data Science “good enough” determination.
To summarize, see the blog “What We Can Learn about AI and Creating Smart Products from The Inc…” and have some fun doing battle with The Incredibles and learn something about Data Science, Machine Learning, Deep Learning and AI in the process (see Figure 3)!
Figure 3: Understanding how to Create AI-inspired Smart Products
Now, onto the rest of our story (cue the introductory music…)
So why do I feel that the Game Boy Final Fantasy Legend II is a great analogy for the highly non-linear data science processes?
Here are a few of my reasons, and if you have others, please share them!
- It takes a team to win the game, and the more diverse the team with different capabilities and tools, the better. Build your team based upon potential rather than current capabilities. For example, while the Robot is powerful in the earlier levels and will single-handily win lots of battles, the Robot eventually tops out and becomes ineffective at the later, more difficult levels. Check out “A Winning Game Plan For Building Your Data Science Team” to learn more about how you can build a data science team that can win when the real battles start!
- You explore within and across levels to discover and acquire weapons (swords, axes, bows, guns), items (shields, armor, helmets, gloves) and skills (judo, cures, curses) of varying levels of power and effectiveness.However, the path to discover is not a straight or predictable path. There will be several times where you will need to double back to a previous level to gather important items (and insights) that you were not capable (not strong enough or lacking certain items and skills) of gathering before.
- The goal in each world is to obtain enough “Magi” to progress to the next world.But the process of discovering the “Magi” is not linear. You will have to progress up levels (to build powers and gather new weapons and items), and then have to come back down to previous levels to leverage your new strengths to uncover hidden Magi. Think of the Magi as equivalent to corporate funding that enables you to continue to fund your data science journey.
- You can’t measure success by number of hours played. Just playing the game more doesn’t help win the game, you must create and test different hypotheses throughout the game to successfully advance from one level to the next. Game progress and success is achieved by successfully defining, testing and proving out hypothesis-by-hypothesis.
- You will learn and get stronger with each interaction, and you will certainly fail at defeating certain enemies at certain times along the journey. But those failures providing a learning opportunity to better understand the extent of the current deficiencies in you and your team. Failing is a natural way to learn. To be a great data scientist, you cannot be afraid of failing. If you aren’t failing enough, then you’re not pushing the edges of learning enough. Check out “Why Is Data Science Different than Software Development?” to better understand the role of failure in your data science development process.
- Everyone takes a turn leading. There are certain situations where the wizard must lead, and other situations where an imp might have to lead, and others where the human must lead. Everyone on the team must be prepared and not afraid to lead depending on the situation.
- Embrace ‘unlearning.’ Just when you think you have developed the necessary skills and capabilities, and have acquired all the right items, then you battle one of the boss monsters and learn that all your planning to win that battle was inadequate. The capabilities and weapons that help to defeat one type of monster, may be totally irrelevant to the next monster. Be prepared to let go of what you thought was the right approach and be prepared to learn a new one. Just like real life in the data science world.
- You may have to start the quest all over again from the beginning. You may find at the later levels that the team you have assembled and the weapons and items that you have gathered are insufficient to winning the final level. Yep, it’s like a stinkin’ saddle point and your team and your strategy just tops out before you can achieve victory (note: I learned the lesson about building a team around robots and Imps the hard way). Check out “What We Can Learn about AI and Creating Smart Products from The Inc…” to learn more about the challenge of saddle points.
- In the same way that the data scientist will have to try different combinations of data, data enrichment, feature engineering and algorithms to get the necessary analytic results,in Final Fantasy Legend II you will have to try different combinations of weapons, shields, casts and cures to beat certain foes. And surprise, sometimes it’s the combinations that you least expected that yield valuable insights into beating that foe.
- Finally, there will be many annoying and some very evil creatures along that path who are trying to hinder, slow down or kill your journey (just normal life in the corporate world). Build and nurture a strong collaboration with your business stakeholders and constituents who can guide you, and even help you do battle at critical times on the journey.
I loved this blog because it gave me a reason to dig up my old Game Boy and re-visit an old but fun adventure. And with some international travel ahead, I hope the folks sitting next to me don’t mind me grappling with a few Pirates, Imps, Ghosts, Commandos and Slime Gods.
And don’t forget to drop some coins into the jukebox in the bar so you can jam out while you “Go in search of hidden treasures”!
- It is hard to convey the non-linear data science development process to folks who live and operate within linear, highly-reengineered processes
- Software Development defines the requirements for success; Data Science discovers them
- Data Science is a highly non-linear process that embraces an “exploration, discovery, training, testing, failing and learning” development methodology
- Understanding how to beat Game Boy® Final Fantasy Legend II provides a good analogy for how the Data Science Development process works