In the latter part of the 2000s, DevOps solutions emerged as a set of practices and solutions that combines development-oriented activities (Dev) with IT operations (Ops) in order to accelerate the development cycle while maintaining efficiency in delivery and predictable, high levels of quality. The core principles of DevOps include an Agile approach to software development, with iterative, continuous, and collaborative cycles, combined with automation and self-service concepts. Best-in-class DevOps tools provide self-service configuration, automated provisioning, continuous build and integration of solutions, automated release management, and incremental testing.
Solutions for DevOps include tools for managing development communication, processes, and tasks, capabilities for testing and integration, ability to provision server, application, and infrastructure, tools for managing code, artifacts, releases, and monitoring of logs and deployments. In this way, organizations can quickly build, develop, test, deploy, and manage code quickly with high degrees of visibility and quality. Given the track record of success that DevOps has had in making application development more robust, efficient, and speedy, it makes sense that developer-focused organizations want to apply DevOps approaches and methodologies to the development, deployment, and management of machine learning models.
Applying DevOps to ML
However, DevOps approaches to machine learning (ML) and AI are limited by the fact that machine learning models differ from traditional application development in many ways. For one, ML models are highly dependent on data: training data, test data, validation data, and of course, the real-world data used in inferencing. Simply building a model and pushing it to operation is not sufficient to guarantee performance. DevOps approaches for ML also treat models as “code” which makes them somewhat blind to issues that are strictly data-based, in particular the management of training data, the need for re-training of models, and concerns of model transparency and explainability.
As organizations move their AI projects out of the lab and into production across multiple business units and functions, the processes by which models are created, operationalized, managed, governed, and versioned need to be made as reliable and predictable as the processes by which traditional application development is managed. In addition, as the markets for AI shift from those relatively few organizations that have the technical expertise required to build models from scratch to those enterprises and organizations looking to consume models built by others, the focus shifts from tooling and platforms focused solely on model development to tools and platforms focused on the overall usage, consumption, and management of models.
Implementing artificial intelligence solutions at scale can be challenging. Many organizations and public sector agencies struggle to rapidly deploy, manage and secure the machine learning models that power the core of today’s AI solutions. Furthermore, data scientists, IT operations, data engineering, line of business, and ML engineering teams often work in silos. This results in complexities for creating, managing, and deploying ML models within their own division or organization. These challenges are further complicated as these organizations share those models within or across the entire organization or agency, or alternatively consume third-party models or models from outside the organization. As a result, the increased complexity of dealing with multiple models in different versions from multiple sources result in issues around model versioning, governance of models and access, potential security risks, challenges with regard to monitoring model usage, and duplicated efforts with multiple teams creating very similar models.
While much of the attention up until now has been focused on the development of machine learning models, as the industry moves from innovators and early adopters to the early majority, later entrants will be more concerned about consuming models developed by others and adoption of recognized best practices rather than building their own models from scratch. This means that these model consumers will be primarily concerned with the quality and reliability of existing models more so than attempting to create a data science organization and investing in tools, technology and people to build their own models.
Simply training machine learning models isn’t sufficient to provide required capabilities. Once a model has been trained, it needs to be applied to a particular problem, but you can apply that model in any of a number of ways. The model can sit on a desktop machine providing results on demand, or it can sit on the edge in a mobile device, or it can sit in a cloud or server environment providing results to a wide range of use cases. Each one of these places where the model is being placed into a real-world situation can be considered a separate deployment, so simply saying the model is deployed doesn’t give us enough information. Putting machine learning models into real-world environments where they are acting on real-world data and providing real-world predictions (i.e. inferencing) is called “operationalizing” the machine learning models.
Emergence of ML Ops Tools and Solutions
To address these needs, there are tools and solutions emerging in the market that consider ML models to be distinct from code and can handle the range of ML model-specific management needs. They can manage not only the process for model creation and operationalization, but also manage and monitor the data used at training and real-time, the performance of models over time, and specific needs for model governance, security, and transparency.
As opposed to the portion of the puzzle focused on machine learning model development, Machine Learning Model Operationalization Management, which is often referred to as “MLOps”, is focused on the lifecycle of model development and usage, and in particular, aspects of machine learning model operationalization and deployment.
The core aspects of MLOps include:
- Model Lifecycle Management – Similar to the needs for application development processes in traditional “DevOps” tools, MLOps tools need to help manage the lifecycle for model development, training, deployment, and operationalization, and provide consistent, reliable processes for moving models from the data science environment to the production environment.
- Model Versioning & Iteration – As models are consumed, they will most likely be iterated and versioned to deal with new and emerging needs as models change based on new training or real-world data. MLOps solutions provide capabilities that can operationalize different versions of models, supporting multiple versions in operation as needed, provide notification to model users of version changes, visibility into model version history, and can help make sure that obsolete models are not used.
- Model Monitoring and Management – Since the real world continues to change and doesn’t match up to the world used in training data, MLOps solutions need to monitor and manage model usage, consumption, and results of models to make sure that their accuracy, performance, and other measures continue to provide acceptable results. Such solutions can provide visibility into data and model “drift” and keep an eye on various measures of model performance against thresholds and benchmarks.
- Model Governance – Models that are used in real-world situations need to be relied upon. As such, MLOps platforms provide capabilities for auditing, compliance, governance, and access control. This includes features for model and data provenance (tracing data changes to model change), model access control, prioritizing model access, providing transparency into how models use data, and any regulatory or compliance needs for model usage.
- Model Discovery – MLOps solutions can provide model registries or catalogs for models produced within the tool ecosystem as well as a searchable model marketplace that provides a way to locate consumable models, both internally developed as well as third-party models. These model discovery solutions should provide sufficient information to be able to ascertain the relevance, quality, data origination, transparency of model generation, and other factors for a particular model.
- Model Security – Models are assets that need to be protected. MLOps solutions can provide functionality to protect models from being corrupted by tainted data, being overwhelmed by denial of service attacks, attacked through adversarial means, or being inappropriately accessed by unauthorized users.
As detailed in a recent Cognilytica report on MLOps, Increasingly the market is seeing the emergence of MLOps solutions designed to simplify the usage and consumption of various AI and ML models. These solutions will become increasingly required as the bulk of the market adopts AI, giving them the ability to use and manage pre-trained models with the same or better confidence in performance as internally built models.
The MLOps market is relatively immature and nascent, with technology solutions emerging only in the last year or two for effective model management. However, it is predicted that the ML Ops market will be over $4 Billion in just a few years, and as such promises to be a major component of the AI solution landscape shortly. Indeed, it’s been predicted to be a major trend even for 2020. No doubt, ML Ops is here to stay.
Credit: Google News