An intuition and tutorial on trust score
Several efforts to improve deep learning performance have been done through the years, but there are only few works done towards better understanding the models and their predictions, and whether they should be trusted or not.
In this article, we shall lightly probe the trustworthiness of a model in terms of its predictions. However, the term trust might seem vague and might reflect a wide range of its denotations and/or connotations. So, for the sake of our discussion, it may be safer that we limit the term trust to denote a “fail-safe” feature for a model’s predictions — that is, a secondary or supporting opinion of the model predictions.
If you are more interested on the practical stuff, you may skip to the Trust Score section.
Since the re-emergence of deep neural networks in 2012 by famously winning the ImageNet Challenge (Krizhevsky et al., 2012), we have employed deep learning models in a variety of real-world applications — to the point where we resort to deep learning to solve even the simplest problems. Such applications range from recommendation systems (Cheng et al., 2016) to medical diagnosis (Gulshan et al., 2016). However, despite the state-of-the-art performance of deep learning models in these specialized tasks, they are not infallible from committing mistakes, in which the degree of seriousness of such mistakes vary per application domain. So, the call for AI safety and trust is not surprising (Lee & See, 2004; Varshney & Alemzadeh, 2017; Saria & Subbaswamy, 2019). For years, much of the efforts were about improving the performance of models, while further investigation on model limitations has not received an equal effort.
Despite receiving relatively less attention, there are some excellent works on better understanding model predictions, and these include but are not limited to the following: (a) the use of confidence calibration — where the outputs of a classifier are transformed to values that can be interpreted as probabilities (Platt, 1999; Zadrozny & Elkan, 2002; Guo et al., 2017), (b) the use of ensemble networks to obtain confidence estimates (Lakshminarayanan, Pritzel, & Blundell, 2017), and (c) using the softmax probabilities of a model to identify misclassifications (Hendrycks & Gimpel, 2016).
Now, the aforementioned methods use the reported score of a model for confidence calibration — which may seem daunting even just to think about. Enter: Trust Score. Instead of merely extending the said methods, Jiang et al. (2018) developed an approach based on topological data analysis, where they provide a single score for a prediction of a model, called trust score.
Jibber-jabber aside, the trust score simply means the measurement of agreement between a trained classifier f(x) and a modified nearest-neighbor classifier g(x) on their prediction for test example x.
The agreement between a trained classifier f(x) and a modified nearest-neighbor classifier g(x) on their prediction for test example x is measured as the ratio of the distance of x to the nearest class different from the predicted class (let’s denote this as ĥ) to the distance of x to the predicted class (let’s denote this as h).
A score of 1 would mean that the predicted class h and the “closest not predicted class” ĥ are equidistant to the test example x. This would then imply that a predicted class with a higher score than 1 is trustworthy since the distance of ĥ to x is higher, i.e. trust score = ĥ / h.
The explanation above was meant to be the intuition behind the trust score. But we shall further inspect how the trust score is computed.
To compute the trust score, we first need to have an α-high-density-set, which can be obtained through Algorithm 1 where it filters out the α-fraction of the training data points with lowest empirical density, a.k.a. data points that do not seem to cluster together so much. Put it simply, the α-high-density-set consists of training data points where the probable α-coefficient outliers are eliminated — thus the purpose of the α parameter. Jiang et al. (2018) defined this procedure as their modification to the nearest-neighbor classifier.
The resulting α-high-density-set would serve as the dataset for our nearest-neighbor classifier. However, since a (k)NN classifier has a search time of O(n), a k-d tree is used to speed up the search, thus reducing the search time complexity to O(log(n)). A k-d tree for each class in the set is then constructed.
To do this in code, let’s borrow Seldon’s extended implementation of the open source code from Jiang et al. (2018). We can define the instances to keep by using
np.percentile(knn_radius, (1 — alpha) * 100.0), and then we can repeat this process for other classes.
From this, we should expect a training dataset with the data points from the same class to be more clustered together — a.k.a the α-high-density-set.
After obtaining the α-high-density-set, we can now compute the trust score of a trained model prediction — detailed in Algorithm 2.
When a trained classifier makes predictions on a test example x, the distance between x and each of the trees is measured. The trust score is then calculated by taking the ratio of (a) the smallest distance (hence the argmin function) between ĥ and x to (b) the distance between h and x.
We can measure the distances either by using distance to the k-th nearest neighbor in each tree
dist_type = "point" or by using the average distance from the first to the k-th nearest neighbor
dist_type = "mean" — we shall measure the distances by using
dist_type = "point" in this article. Then, the distance metric we shall use in this article is Euclidean distance (although any distance metric may be used).
However, for visualization purposes, the
score function used in this article was modified to retrieve not only the trust score and the closest not predicted class, but also the indices of the predicted class and of the nearest class other than the predicted class. The modified implementation is available here.
Despite the relatively long explanation on how to compute the trust score for a model prediction, all we need to do so is write three lines (excluding the import line),
Since computing the trust score relies on kNN (and kNN suffers from curse of dimensionality), instead of using the training features as they are, we can encode them to a lower dimension — in which Jiang et al. (2018) found out that trust score works best for low- and medium-dimension feature spaces.
encoded_train_features in the code snippet above was encoded using principal component analysis (PCA) with 64 principal components.
Then after fitting a
TrustScore object, we can use it to compute the trust scores on model predictions. The code snippet above lays down the canonical code for computing the trust score, and getting ĥ — which will also use the PCA-encoded test features.
Summary on computing trust score: First, we get the α-high-density-set (where probable outliers are filtered out) by using Algorithm 1. Second, we compute the ratio of the distance between ĥ and x (let’s denote this as d(ĥ, x)) to distance between h and x (let’s denote this as d(h, x)) using Algorithm 2.
Now that we know how to compute the trust score for a model prediction, let’s take it out for practice, and use it to compare three deep learning models.
For this article, we go through three deep neural networks: a feed-forward neural network, a LeNet convolutional neural network (LeCun et al., 1998), and a miniature VGG convolutional neural network.
We shall implement our deep learning models using TensorFlow 2.0, but first, the TF version used in the prepared experiments was
2.0.0-beta1. It is recommended to install it inside a virtual environment,
pip install tensorflow==2.0.0-beta1
or if you have a GPU in your system,
pip install tensorflow-gpu==2.0.0-beta1
There are more details on installation in this guide from tensorflow.org.