Monday, April 12, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Data Science

When Is a Machine Learning Model Good Enough for Production, and How to Stress About It Only Once?

November 19, 2020
in Data Science
When Is a Machine Learning Model Good Enough for Production, and How to Stress About It Only Once?
586
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

As you start incorporating machine learning models into your end-user applications, the question comes up: “When is the model good enough to deploy?”

There simply is no single right answer.

You might also like

Data Center Infrastructure Market is Projected to Reach USD 100 Billion by 2027

Interpretive Analytics in One Picture

Job Scope For MSBI In 2021

There is no clear-cut measure of when a machine learning model is ready to be put into production, but there are a set of thought experiments that you should go through for each new model.

Identify the Goal of the Machine Learning Model

When you are trying to decide if a machine learning model is ready for deployment, it is helpful to circle back to the algorithm’s original goal. Are you trying to predict customer churn, or when you should reach out to a client? Or is your intent to automatically approve or deny someone’s credit application?

The use case for the model will determine how stringent the requirements for deployment should be. For instance, if the machine learning model will simply be suggesting things to the user, the deployment requirements will be wildly different compared to an algorithm designed to make decisions automatically.

The autonomous driving space has, for example, illustrated this dilemma with six levels of autonomy. As you might imagine, each subsequent level would have stricter requirements.

The 6 Levels of Vehicle Autonomy Explained

Simply put, depending on the type of model, you may be able to deploy it right away or potentially need to roll it out in stages. Different models will require different thresholds, and only you can decide what is appropriate on a case-by-case basis.

Strive for X

What metric or combination of metrics decides whether your model is ready for deployment depends on the use case. Accuracy is vital to any machine learning model and is the most often talked about. Without accurate predictions, there is no purpose for deploying the algorithm – so strive for the best accuracy you can within reasonable limitations.

However, depending on your use case, direct test accuracy might not be the only metric to use. It might not even be the most important one. For example, let’s imagine our model would decide whether someone needs surgery or not. It would be a problem if we don’t operate on all the sick people, but we could consider operating on a healthy patient to be a much more significant mistake. In cases like this where false positives are dangerous, metrics like precision, recall, and F1 score might be more appropriate.

But to keep it simple for this article, let’s focus on accuracy.

In practical terms, you’ll be confirming the accuracy of your model in two phases: during the training phase and a separate testing phase. To evaluate accuracy during the training phase, you’ll generally set aside a portion of your data set as validation data and use that set to validate your trained model. As training data, validation data can be used several times to find the optimal hyperparameters.

Michael McCourt (SigOpt) wrote about the data split in ‘Practical MLOps: How to Get Ready for Production Models’

In the testing phase, you should again evaluate your trained model’s accuracy but using data that hasn’t been used during the training iterations. This is to test whether your model can generalize. In other words, can it produce accurate predictions with new data, or has your model just memorized the answers for the training and validation data? The term used here is overfitting for when a model performs well during training but fails in testing.


Cassie Kozyrkov (Google) has done a fantastic, short explainer on Validating vs. Testing.

Codify Your Decisions

From an engineering perspective, the criteria you set for your model are not as important as making sure that they are codified into your machine learning pipeline. The accuracy requirements should be written in as baselines within your machine learning pipeline so you can be sure that they are adhered to. Your newly deployed model should also serve as a benchmark for future models, and you may want to always compare your new iterations against the production model in the testing phase.

Similarly, how you perform the data split between training, validation, and testing data should be part of your training pipeline, rather than a manual process or a separate script. After all, in a production setting, the purpose is not to train and deploy a single model once but to build a system that can continuously retrain and maintain the model accuracy. The pipeline is the product – not the model.

If you are interested in learning more about machine learning pipelines and MLOps, consider our other related content.


Credit: Data Science Central By: Henrik Skogström

Previous Post

Transport for NSW trials machine learning to detect crash blackspots - Cloud - Software

Next Post

Starting next year, Chrome extensions will show what data they collect from users

Related Posts

Data Center Infrastructure Market is Projected to Reach USD 100 Billion by 2027
Data Science

Data Center Infrastructure Market is Projected to Reach USD 100 Billion by 2027

April 12, 2021
Interpretive Analytics in One Picture
Data Science

Interpretive Analytics in One Picture

April 12, 2021
Job Scope For MSBI In 2021
Data Science

Job Scope For MSBI In 2021

April 11, 2021
Leveraging SAP’s Enterprise Data Management tools to enable ML/AI success
Data Science

Leveraging SAP’s Enterprise Data Management tools to enable ML/AI success

April 11, 2021
Vue.js vs AngularJS Development in 2021: Side-by-Side Comparison
Data Science

Vue.js vs AngularJS Development in 2021: Side-by-Side Comparison

April 10, 2021
Next Post
Starting next year, Chrome extensions will show what data they collect from users

Starting next year, Chrome extensions will show what data they collect from users

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Ransomware: The internet’s biggest security crisis is getting worse. We need a way out
Internet Security

Ransomware: The internet’s biggest security crisis is getting worse. We need a way out

April 12, 2021
Data Center Infrastructure Market is Projected to Reach USD 100 Billion by 2027
Data Science

Data Center Infrastructure Market is Projected to Reach USD 100 Billion by 2027

April 12, 2021
Hawaiʻi’s Keck Observatory Aids in Discovery of Rare “Quadruply Imaged Quasars”
Machine Learning

Hawaiʻi’s Keck Observatory Aids in Discovery of Rare “Quadruply Imaged Quasars”

April 12, 2021
Interpretive Analytics in One Picture
Data Science

Interpretive Analytics in One Picture

April 12, 2021
AI and Machine Learning Driven Contract Lifecycle Management for Government Contractors
Machine Learning

AI and Machine Learning Driven Contract Lifecycle Management for Government Contractors

April 12, 2021
Cambridge Quantum Computing Pioneers Quantum Machine Learning Methods for Reasoning
Machine Learning

Cambridge Quantum Computing Pioneers Quantum Machine Learning Methods for Reasoning

April 11, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Ransomware: The internet’s biggest security crisis is getting worse. We need a way out April 12, 2021
  • Data Center Infrastructure Market is Projected to Reach USD 100 Billion by 2027 April 12, 2021
  • Hawaiʻi’s Keck Observatory Aids in Discovery of Rare “Quadruply Imaged Quasars” April 12, 2021
  • Interpretive Analytics in One Picture April 12, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates