Credit: Google News
The issue of AI Trust
My ten year old daughter recently told me that she does not expect to get a driving license. Upon asking her why, she explained that when she is old enough to drive, she expects all cars to be self-driving. She further elaborated that she should need to be even more vigilant in a self-driving car, because, in her words – “I always know what I am thinking. But who knows what those cars are thinking!”. As AI and ML permeate more businesses and become part of daily life, natural human fears are being expressed in various ways, from individual to corporate concerns to government regulations. News stories appear daily, detailing AI mistakes that led to corporate losses or embarrassment [2 ,3,4,7]. Other examples include impacts to human loss of health and life  and appearance of bias and unfair practices [6, 7].
Corporations are taking steps to regulate AI behaviors. A recent example is Amazon’s decision to stop an AI human resources tool due to bias against women . New regulations and reviews are emerging [8, 9,10]. A good example is New York City – which set precedent in 2018 by creating a task force to examine and audit algorithmic use . These and similar initiatives demonstrate how individuals, corporations and governments are struggling to manage risk while encouraging the tremendous potential of AI technologies.
As production AI usages grow, good operational ML (MLOps) practices  will be needed to ensure that production ML systems deliver and maintain quality so that the users combine AI benefits with growing trust in their AI systems. This blog describes a key component of such a practice – ML Integrity.
ML Integrity: A Necessary Condition for AI Trust
What these concerns demonstrate is a lack of trust, which leads to the question – how does one grow trust in a new technology? Many have weighed in on this topic [examples in 11, 12] and pointed out that trust is a complex concept that intertwines correct operation with social values (such as whether a decision made by an AI is morally good in a human context). While trust has many facets, a core component is Integrity. Integrity may not be a sufficient criterion for trust – a system that demonstrates integrity operates correctly as defined by its designers, but whether that is enough may depend on whether the humans made design decisions that society considers acceptable. However, integrity is a necessary criterion for trust – a system that does not demonstrate integrity cannot be trusted since it is not executing as planned.
Other domains, including other software arenas, have established the concept of integrity as a core element for trust. For example, Wikipedia defines Computer System Integrity  as:
- That condition of a system wherein its mandated operational and technical parameters are within the prescribed limits.
- The quality of an Automated Information System when it performs its intended function in an unimpaired manner, free from deliberate or inadvertent unauthorized manipulation of the system.
- The state that exists when there is complete assurance that under all conditions an IT system is based on the logical correctness and reliability of the operating system, the logical completeness of the hardware and software that implement the protection mechanisms, and data integrity.
In order for humans to trust an AI, the algorithms, software and production deployment systems that train AI models and deliver AI predictions must behave within specified norms and in a predictable and understandable way – i.e with ML Integrity. ML Integrity is the core criterion that a machine learning (or deep learning, reinforcement learning etc.) algorithm must demonstrate in practice and production to drive trust.
Four pillars of ML Integrity in Production Systems
Production ML systems have many moving parts. At the center is the Model, the trained AI algorithm that is used to generate predictions. However models have to be trained and then deployed in production. Many factors (example: the training dataset, training hyper-parameters, production code implementation, and incoming live production data) combine to generate a prediction. For this entire system and flow to behave with integrity, four pillars must be established. These are shown in the figure below:
- ML Health: the ML model and production deployment system must be healthy – i.e behaving in production as expected and within norms specified by the data scientist.
- ML Explainability: it must be possible to determine why the ML algorithm behaved the way that it did for any particular prediction and what factors led to the prediction.
- ML Security – the ML algorithm must be healthy and explainable in the face of malicious or non-malicious attacks – i.e. efforts to change or manipulate its behavior.
- ML Reproducibility: All predictions must be reproducible. If an outcome cannot be faithfully reproduced, there is no way to know for sure what led to the outcome or debug issues.
Each pillar is challenging in its own right and require specialized practices:
- ML Health: Standard failure detection techniques cannot detect suboptimal prediction patterns. Production systems need specialized techniques to assess whether the models used for prediction are behaving optimally. For example, Data Deviation detectors can show when production data differs from patterns used during training, indicating that models may not have adequate information to make predictions [14, 15].
- Explainability: While some algorithms (like Decision Trees) are explainable (ie one can provide a human interpretable explanation for how the prediction was generated), others like Neural Networks are not (yet) explainable.
- Security: ML generates new security challenges at both the algorithmic and data management levels [16,17,18,19]. Corruption of datasets can cause a model to be mistrained and then generate damaging predictions during use. Similarly, studies have shown that ML models can be fooled into making incorrect predictions by distorting incoming data.
- Reproducibility: the large number of artifacts, algorithm settings, code versions, system parameters, datasets etc. that contribute to generate a single prediction can make reproducibility challenging. To make AI truly reproducible, a precise lineage and provenance must be maintained for every prediction  .
Furthermore, these pillars are not independent. For example, research presented at IEEE ICMLA last year showed that deviations in production data can even cause explainability techniques to lose validity .
What about Performance?
All the above areas can affect the quality of an ML algorithm output – ie whether the ML algorithm generates “acceptably good” predictions. An orthogonal but also important area is performance – i.e. was the desired prediction generated as quickly as needed? Performance is a critical element that is complementary and orthogonal to integrity. Even a high integrity ML application will not be usable if it cannot generate predictions fast enough. However, the reverse is also true – predictions that cannot be trusted are not usable (and are dangerous) even if they show up at perfect speeds.
Driving ML Integrity in 2019 and beyond
Each of these areas is a focus of research and practice optimization across the industry. Forums like the new USENIX Conference on Operational ML  provide venues where these topics can be debated and best practices shared and defined. Industry vendors and institutions are also defining MLOps, best practices for ML algorithm deployment, testing, monitoring and lifecycle management that can help organizations scale their ML production initiatives while maintaining ML Integrity [1,23,24].
I believe that 2019 is the breakout year for ML in production. As model development tools mature and more data scientists get trained, the number of models straining to get into production will only grow. As industries attempt to balance their desire for innovation with their need for risk management, MLOps practices will be needed to ensure that production ML systems deliver and maintain ML Integrity so that the users combine AI benefits with growing trust in their AI systems.
 Ghanta, S. ML Health: Taking the Pulse of ML Pipelines in Production. Grace Hopper Conference 2018. https://s3.amazonaws.com/prg-s3-production/app/uploads/42182/20180926090023-GHC_presentation.pdf
 Model Governance: Reducing the Anarchy of Production ML. USENIX ATC 2018. https://www.usenix.org/conference/atc18/presentation/sridhar
 Why MLOps (and not just ML) is your Business’ new Competitive Frontier. https://aibusiness.com/mlops-parallelm-competitive-edge/
 Operational Machine Learning: Seven Considerations for Successful MLOps. https://www.kdnuggets.com/2018/04/operational-machine-learning-successful-mlops.html
Credit: Google News