This post is about improving the effectiveness of the data science team and improving collaboration between data scientists and stakeholders for better outcomes.
Regardless of the specific project, agreeing on the expected outcomes and goals before beginning the work is a best practice. But with the advent of machine learning (ML) models, it’s for both sides to discuss the critical measures of success for the given project. Using a framework such as the Objective Key Results (OKRs) is a great way to approach this process of aligning goals and expectations.
Understanding the problem/business context:
Some companies do this organically, while others don’t. If the data science team is new to the company, they should have a basic understanding of the problem space. Stakeholders must explain what they want to achieve and why it is critical now. Specific circumstances regarding the mode of output consumption is a major part of this discussion. The following two common output consumption scenarios:
- Consumed by external parties (e.g., an investor making credit decisions, clients taking your fraud prediction into refining their products to end-users, recommendations on websites/e-commerce where the end-user is interacting with the result).
- Consumed by internal customers (e.g., your sales team could use the lead-scoring model to determine who to call next, decide how much bonus pay to give each sales associate, improve the call routing for your customer support, or drive the right content to the right user on the right channel to improve user engagement).
In all the above circumstances, it’s critical to understand what is possible and what the output is like as well as how the output will be consumed by the recipient stakeholder/team. Stakeholders need to take the time and educate the data teams on the problem context as needed. This step will avoid further mishaps – knowing the context helps the data team ask the right question behind the question and get answers quickly without spinning their wheels.
Sharing sample outcome and usage:
Once the goals/metrics are identified and agreed upon, it’s best to get stakeholders acclimated to the sample outcome before diving into the project. This is not the time to discuss the nitty-gritty details of the model because it’s too early – at this point, you would have simply understood the problem context but not yet figured which ML models work best for your scenario. Additionally, spending too much time ironing out the finer details could mean wasted time and opportunity.
If the output is a probability, a value between 0 and 1, consider providing a range of such values and observe how it translates to business decisions. When you share that the output will be a range between 0 and 1, does that make your stakeholder uncomfortable, or do they understand? If they jump out of their seats, do they ask you more questions or simply disagree? If it’s the latter, go back to the previous step; if it’s the former, help them understand how the outputs could be used by providing options, comparable in the industry. If that’s too much for you, then this is a great time to seek help from a more experienced professional.
There are several ways to educate stakeholders on your model without directly explaining its intricate processes. You don’t need to dumb it down, but it’s also fair to say that even you may not be sure which specific path a model is choosing to derive its outcomes. If you are using an ensemble of models, then the situation is more complicated.
So, what can help here? Of course, there are several models and several forms of outputs such as linear regression vs. logistic regression vs. binary classifier vs. RNN vs. CNN, etc, etc. It’s more important to have some go-to ways explaining the model in a way that builds trust and improves mutual collaboration through a confidence-building process. Here are some examples:
- Share a well-known use case outside (or inside) your company: How is your use case, and therefore the model, different from that of another (for example, an Amazon product recommendation system). This can ground the stakeholder to take a different perspective as they are not judging you on the more familiar business context.
- Explain the inputs and data sources: Bring visibility into what’s gone into building the outputs. This is also beneficial for two other reasons – (a) asking for further investment to refine the inputs, such as collecting more data, new data, higher quality data source, etc. (b) setting the expectations on the work that has gone in beyond model building; sometimes it’s forgotten that data gathering, not the model itself, is the most important part (depending on the context).
- Remove chances for misinterpretation: A probability value is not the same as a ranking score. Help stakeholders understand that it’s not simply the output but the chances of error and variance in the output.
- Run simulations on the historical dataset: To build confidence, before pushing models into production, you can gain trust by sharing inputs/outputs from the old process on historical data vs. inputs / new outputs from the new process aka ML models. Assuming the results are comparable, then it can bring in confidence.
- Host periodic roadshows: It’s a good practice to share the improvements you and your team are making via internal roadshows and presentations. This builds visibility and transparency into your process and further increases the chances of collaboration with the rest of the departments.
Continuous collaboration and improvement:
As with any model-building effort, it is not simply a one-time effort, but rather iterative. If the model isn’t performing as expected from the get-go, keeping the stakeholders in the know can help you get the much-needed feedback loop going. For your model to do its job or to validate whether you picked the right model, you need to feed the learnings from your production systems. This may further require help from your end-users, internal customers, and stakeholders. There may be critical budget constraints in the company, so knowing how much improvement the model can offer is a decision to be made by both the data science team and the stakeholders.
Hopefully the above are some useful principles and steps for the data science team to align with the rest of the stakeholders and start impacting key business metrics. Happy collaboration!