In this special guest feature, Arjun Kakkar, Vice President Strategy and Operations at Ekata, provides 9 practical and actionable principles for product managers and business leaders working to use machine learning for fraud detection. Arjun works with Ekata’s operating teams to drive customer value across e-commerce, payments, marketplaces and online lending verticals. Before Ekata, Arjun was a Principal with Booz & Company. He has a B.Tech. from IIT Bombay and an MBA from The Wharton School.
The total recorded cost of global online fraud is about $25 billion . But the real value is at least 20 times higher, because, to catch fraud, online merchants and banks often mistakenly reject legitimate customers. This blunder represents at least $500 billion in lost lifetime revenue for online commerce, not to mention a priceless amount of customer trust.
The unique characteristics of
online fraud detection, including the availability of large and diverse data
sets with known outcomes, repeating patterns, and a need for quick decisions,
make it a good candidate for Machine Learning (ML). In fact, of the many problems
that ML promises to solve, online fraud detection has been one of the earliest
Based on my work with thousands of global merchants and payment providers with best-in-class ML teams, I provide the following 9 practical and actionable principles for Product Managers and business leaders.
DATA – Build the Virtuous Loop
Getting access to the right fraud
signals and labeling data is the most challenging task, but if done right,
delivers a significant advantage to the business.
Principle 1: The model is
only as good as the labels in the test and validation sets
Businesses need to develop clear
definitions of fraud, label their data, and ensure each label cleanly reflects
set definitions. ML methods are usually forgiving of random labeling errors in
the training set, but very susceptible to systematic errors. For example,
“friendly fraud” where customers tag a legitimate transaction as fraud, is
usually random, but others such as a human agent’s labels may be systematic.
Unlike in training, teams must
try to fix even the most random labels in the test and validation sets, to make
them dependable enough to assess the quality of their models.
Principle 2: Access to
unique features will make it hard for fraudsters to crack your model
Fraud teams are competing with
fraudsters who are getting more sophisticated in recreating customer
identities. The best way to catch these fraudsters is to gather unique data
from multiple vendors and partners and find unique attributes that identify the
real human behind the digital identity. Utilize all the data that could help
with risk signaling, including device, identity, individual, and network
Principle 3: Make data a
real asset by building a centralized data repository and keep it secure
A centralized data repository
will ensure the data science team knows what’s available and can leverage it.
Teams also must commit to keeping customer data secure. Follow principles
aligned with EU General Data Protection Act (GDPR), such as collecting the data
that organizations will use to serve the customers’ needs, only storing it
until the time it is needed for preventing fraud, and providing customers
complete control of their data. To drive customer trust, companies need to
genuinely believe in these principles, not just check the box.
HUMANS – Keep the human in
It is very tempting to think of
ML systems for fraud prevention as replacements for humans. In our experience,
the best-in-class companies continue to keep humans in the loop.
Principle 4: Human-level performance is still the
gold standard and will help teams tune their models
The human-level performance of an
experienced manual review team is a reasonable estimation of the best
achievable model performance. Consequently, a high gap between the model
training error and the human-level error is an indicator that teams need to
reduce the model bias.
Principle 5: The most
effective ML systems are designed to work well with humans
Knowing that machines and humans
have very different capabilities, the best ML systems leverage these
differences. Humans can handle outlier cases that might not have enough
historical data or situations that require significant judgment calls. For
example, a business may be getting orders from a new geography or exhibiting a
unique behavior pattern. It is worth getting humans involved in these cases
before generalizing the results to a new ML model.
Use bi-directional feedback to
improve both the machine and human sides. Human feedback improves model bias
and enhances the explainability of models. At the same time, ML models can
provide additional information to make the human’s task simpler or even help
improve human skills.
Principle 6: It is the
team’s responsibility to find and correct for human biases in models
One of the most significant risks
of ML systems is that, by design, they utilize historical data to make
inferences. Humans typically label data. It should be no surprise that the data
will reflect human biases, and it is the team’s responsibility to try to
correct for these biases.
The first step is to identify
potential sources of bias and explicitly look for them in the data. Do the
validation and test data sets represent the real distribution (i.e., don’t have
sample bias)? Has your team included records in the test set that check the
model for systematic prejudice bias? Start with simpler, more transparent, and
explainable and bias-free models and graduate to complicated models over time.
MODELS – Experiment and
ML is a powerful tool for fraud
prevention, but if not done right, it is remarkably easy to build models that
are counterproductive to goals. It is vital first to develop the organization’s
Principle 7: ML models
need a consistent goal, a north star metric that aligns with the overarching
Choose a metric that pairs both a
measure and a counteracting measure to protect against overreacting in one
direction. For example, a team could decide to reduce the portion of fraud the
model correctly catches (minimize “recall”), while deciding on an upper bound
for the portion of legitimate customers that the model incorrectly tags as
fraud (cap the “false positive rate”).
Finally, to make the numbers
tangible, estimate the resulting cost to the business based on the cost of
rejecting a good customer and the cost of unidentified fraud.
Principle 8: Develop
multiple models and retrain often to align with the real world of fraud
ML models try to mimic the real
world. The real world of fraud has a couple of realities that your models
should handle. First, fraud characteristics could vary a lot across geographies
and types of fraud. Build geo- and use-case-specific models if they perform
better. Second, the real world is dynamic, and fraudsters keep evolving their
tactics. Keep a constant flow of new data to retrain models to ensure that the
quality of model output does not degrade over time.
Principle 9: Learn from
other ML use cases with characteristics similar to fraud
Nearly all of the ML modeling
issues teams face in fraud have analogs in other fields with recommended
solutions. Experiment with ideas from these analogs.
Take the example of imbalanced
class distributions in fraud, where nearly all the records in the data belong
to the non-fraudulent category. This problem is similar to cases such as
product defect detection. Or consider the issue of the fraud model in
production biasing the output, impacting the ability to get additional data for
continuous learning. This counterfactual evaluation problem is one that the
online advertising industry also faces, and teams will be able to find several
ideas for experimentation.
To derive real value out of ML for fraud detection, your team must treat ML as an organizational capability. It calls for product, engineering, data science, and privacy teams working together. A company’s success will hinge on implementing working models that solve real business problems. Start small, experiment, and iteratively grow your capability. Over time, your business will not just survive, but also start thriving.
 The Nilson Report 2018, “Card Fraud Worldwide”
Sign up for the free insideBIGDATA newsletter.
Credit: Google News