Monday, March 8, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Machine Learning

How To Handle Hidden Technical Debt In A Machine Learning Pipeline

March 20, 2020
in Machine Learning
How To Handle Hidden Technical Debt In A Machine Learning Pipeline
586
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

It is humbling to think of the number of tools, languages, techniques and applications a machine learning ecosystem has nurtured. Choosing the best fit out of these hundreds of options and then bringing them together to work seamlessly is a data scientist’s nightmare. The hidden technical debts in a machine learning (ML) pipeline can incur massive maintenance costs.

According to a report presented by the researchers at Google, there are several ML-specific risk factors to account for in system design:

You might also like

Dataiku named as Gartner Leader for Data Science and Machine Learning

Podcast: Non-Binding Guidance: FDA Regulatory Developments In AI And Machine Learning – Food, Drugs, Healthcare, Life Sciences

Here’s an adorable factory game about machine learning and cats

  • Boundary erosion
  • Entanglement
  • Hidden feedback loops
  • Undeclared consumers
  • Data dependencies
  • Configuration issues 

Technical debt, popularised by Ward Cunningham in 1992 with a metaphor, represents the long term costs incurred by moving quickly in software engineering. Hidden debt is dangerous and can deliver a fatal blow to the system.

– Advertisement –



In their review, the authors draw parallels between software engineering principles and those of machine learning to investigate the recurring issues. Here are a few key takeaways:

Ripple Effect: Everything Is Connected

Changing Anything Changes Everything or the CACE principle, as the researchers would like to call it, refers to the dependency of every change we make in a pipeline. This principle also extends to hyper-parameters, learning settings, sampling methods, convergence thresholds, data selection, and essentially every other possible tweak.

– Advertisement –


W3Schools


So, isolating models and serving ensembles is recommended. This approach comes in handy where sub-problems decompose naturally such as in disjoint multi-class settings.

A current solution to keep track of changes is to have a high-dimensional visualisation tool that was used to allow researchers to quickly see effects across many dimensions and slicings. There have been quite a few tools developed for this, such as Google’s TensorBoard and Facebook’s HiPlot.

Weeding Out Stale Code And Pipeline Jungles

Any ML pipeline begins with data collection and preparation. During the course of this preparation, operations like scraping, sampling, joining and plenty of other approaches usually accumulate in a haphazard way resembling a jungle; a pipeline jungle. Things get even worse in the presence of experimental code that has been forgotten in the code archives. The presence of such stale code can malfunction. A malfunctioning algorithm can crash stock markets and self-driving cars. The risk in the ML context is just too high.

Feature flags help in keeping track of the code development so that any dead code can be avoided regularly. Since the whole process is tedious, Uber, a multinational ride-hailing company, has come with an automated tool called Piranha. This tool helps the developer bring down the hammer on obsolete code.

Customisation Is Not Always Cool

With a number of languages and tools, one can pick their method of choice and stitch together various languages in their pipeline. This causes problems during testing and would be difficult to share the model across the organisation. Another ignored aspect, warn the authors, is of excessive prototyping. New ideas are usually implemented with prototype models. However, having too many small scale prototypes can be costly and can even blind one from the pitfalls at large scale deployment. 

Configuration Can Be Costly

As the systems mature, they usually end up with a wide range of configurable options such as features used, how data is selected, algorithm-specific learning settings, verification methods, etc.

The number of lines of configuration can far exceed the number of lines of the traditional code

A sample configuration may include things like where a certain feature was logged from and whether it is correctly logged. And, if a certain feature is available for production or if some training jobs should be allocated extra memory and many other million things.

These tiny details in a messed up scenario make configuration almost impossible to be dealt with. The authors suggest that configurations should be reviewed well and stored in a repository. A good configuration system should be easy to visualise, verified and should be devoid of any oversight.

Knowing Where To Look At

Any model in production needs to be continuously monitored. There is always a lot of talk about cross-validation of systems. But are there any simple diagnostics, a good starting point that gives a fair, if not comprehensive, idea of what is going on?

A useful heuristic, as the experts recommend, is to check if the distribution of predicted labels is equal to the distribution of observed labels. This thumb rule can help detect black swan scenarios where the world data no longer resembles historical data on which the model has been trained on. This can be leveraged into designing an automatic alert system.

Cutting Debts

The above-mentioned scenarios are one of the many technical debts that might get induced into an ML system. Configuration debt, data dependency debt, monitoring, management debt and many more. The collection of these debts become more sophisticated as ecosystems support multiple models together. So, it is advisable to be aware of all possible vulnerabilities and then keep checking them regularly. 

For starters, the researchers list down the following questions, answering which might help you build robust ML systems:

  • How easily can an entirely new algorithmic approach be tested at full scale?
  • What is the transitive closure of all data dependencies?
  • How precisely can the impact of a new change to the system be measured?
  • Does improving one model or signal degrade others?
  • How quickly can new members of the team be brought up to speed?

Read the original NeurIPS paper here.

Provide your comments below

comments


Enjoyed this story? Join our Telegram group. And be part of an engaging community.


Credit: Google News

Previous Post

Windows, Ubuntu, macOS, VirtualBox fall at Pwn2Own hacking contest

Next Post

Startup Cuberg Uses AI To Build Energy Dense, Lightweight Batteries

Related Posts

Dataiku named as Gartner Leader for Data Science and Machine Learning
Machine Learning

Dataiku named as Gartner Leader for Data Science and Machine Learning

March 8, 2021
Machine Learning Patentability In 2019: 5 Cases Analyzed And Lessons Learned Part 4 – Intellectual Property
Machine Learning

Podcast: Non-Binding Guidance: FDA Regulatory Developments In AI And Machine Learning – Food, Drugs, Healthcare, Life Sciences

March 8, 2021
Here’s an adorable factory game about machine learning and cats
Machine Learning

Here’s an adorable factory game about machine learning and cats

March 8, 2021
How Machine Learning Is Changing Influencer Marketing
Machine Learning

How Machine Learning Is Changing Influencer Marketing

March 8, 2021
Video Highlights: Deep Learning for Probabilistic Time Series Forecasting
Machine Learning

Video Highlights: Deep Learning for Probabilistic Time Series Forecasting

March 7, 2021
Next Post
Startup Cuberg Uses AI To Build Energy Dense, Lightweight Batteries

Startup Cuberg Uses AI To Build Energy Dense, Lightweight Batteries

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Top 6 Regression Techniques a Data Science Specialist Needs to Know
Data Science

Top 6 Regression Techniques a Data Science Specialist Needs to Know

March 8, 2021
Dataiku named as Gartner Leader for Data Science and Machine Learning
Machine Learning

Dataiku named as Gartner Leader for Data Science and Machine Learning

March 8, 2021
Bill establishing cyber abuse takedown scheme for adults enters Parliament
Internet Security

eSafety defends detail of Online Safety Bill as the ‘sausage that’s being made’

March 8, 2021
An Easy Way to Solve Complex Optimization Problems in Machine Learning
Data Science

An Easy Way to Solve Complex Optimization Problems in Machine Learning

March 8, 2021
Machine Learning Patentability In 2019: 5 Cases Analyzed And Lessons Learned Part 4 – Intellectual Property
Machine Learning

Podcast: Non-Binding Guidance: FDA Regulatory Developments In AI And Machine Learning – Food, Drugs, Healthcare, Life Sciences

March 8, 2021
Here’s an adorable factory game about machine learning and cats
Machine Learning

Here’s an adorable factory game about machine learning and cats

March 8, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Top 6 Regression Techniques a Data Science Specialist Needs to Know March 8, 2021
  • Dataiku named as Gartner Leader for Data Science and Machine Learning March 8, 2021
  • eSafety defends detail of Online Safety Bill as the ‘sausage that’s being made’ March 8, 2021
  • An Easy Way to Solve Complex Optimization Problems in Machine Learning March 8, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates