Credit: Google News
Statistical inference is intimately tied to probability distributions – Gaussian, Poisson, Binomial etc. are evidence-backed probability density functions corresponding to specific event characteristics. There are application domains whereby algorithmic approaches are wholly appropriate (e.g. genetic algorithms in robotics), and even necessary (neural networks and image classification) when it is difficult to operationalize probability density (and the scope of data and context are contained).
The illustration above shows why Data Scientists applying the latest algorithmic approach in a domain with known probability densities risk sacrificing predictive power for model accuracy. They are either seduced by the latest “fad” or unaware that algorithmic-based approaches perform poorly beyond the limits of their input data range. More importantly model accuracy should not the sole end in itself because accuracy and overfitting are two sides of the same coin
All this means it is more important than ever to practice the 3 criteria of model evaluation that are typically tossed out at the beginning of Research / Statistics classes, and given scant attention after.
Parsimony – the simplest possible model is the best model. This recognizes one is overfitting at some point by adding more variables, and that each method introduces its own biases and data assumptions.
Validity – addressing potential biases in the data, and triangulating results with external sources vs. accepting model-generated metrics as truth.
Reliability – is the extent in which the results are replicable in different / real-world contexts. The current reliance on accuracy metrics in Machine Learning parallels the reliance on p-values by scientific community. By itself, a highly accurate model or a highly statistically significant study does not guarantee it would perform similarly in a different / real-world context.
This problem of an ever-increasing accumulation of large bodies of unreplicable published research forced the ASA (American Statistical Association) to issue a statement on p-values in 2016, essentially saying they should not be used as the sole basis of evaluation, and are not a substitute for scientific reasoning.
Data Scientists/ML experts etc. should take the same to heed as well (particularly in domains dealing with stochastic processes, shifting time series etc) as it typically takes less than 10% input noise/variance to break predictions.
About the Author
Stephen Chen spent over 10 years as an advanced analytics management consultant in diverse industries (including financial, retail, CPG, automotive, telecom, technology, healthcare, agency etc.) helping senior management turnaround their businesses with his blend of data science, strategy, and human insights expertise. Along the way, he has audited numerous models as well as developed analytic techniques that address shortcomings of existing methods.
Originally trained as a social scientist, Stephen was one of the handful of researchers to specialize in Social Network Analysis before the ascendance of Google and Facebook popularized those algorithms. He taught classes on Critical Statistics in university and has been a passionate advocate for model robustness and reliability.
Credit: Google News