### ML07: What is “Robust” ?

#### robust statistics / robust model / robustness

Read time : 10 min

Robust/robustness is a commonly used but often not elaborated concept in statistics/machine learning. We get started with some instance:

1.Robust: median, IQR, trimmed mean, Winsorized mean

2.Non-robust: mean, SD, range

Outline1. Treat outliers as errors, then remove them

(1) Definition of “Robust”

(2) Dealing with Errors and Outliers

2. Use domain knowledge and find outliers possible

3. Use robust methods1. Parametric approach

(3) Another Instance

(4) Parametric, Non-parametric and Robust Approaches

2. Robust approach

3. Non-parametric approach1. Resampling

(5) Another Robust Method: Resampling

2. Jack-knifing

3. Bootstrap

(6) Reference

“All models are wrong, but some are useful” — G. E. P. Box

### (1) Definition of “Robust”

Let’s take a close look at the definitions of “robust / robustness” from a variety of sources:

1. Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially

for distributions that are not normal. [2]

2. A robust concept will operate without failure and produce positive results under a variety of conditions. For statistics, a test is robust if it still provides insight into a problem

despite having its assumptions altered or violated. In economics, robustness is attributed to financial markets thatcontinue to perform despite alterations in market conditions. In general, a system is robust if it canhandle variability and remain effective. [3]

3. Robust statistics, therefore, are any statistics that yield good performance when data is drawn from a wide range of probability distributions that

are largely unaffected by outliers or small departures from model assumptionsin a given dataset. In other words, a robust statistic is resistant to errors in the results. [4]

### Trending AI Articles:

1. 130 Machine Learning Projects Solved and Explained

2. The New Intelligent Sales Stack

3. Time Series and How to Detect Anomalies in Them — Part I

4. Beginners Guide -CNN Image Classifier | Part 1

Then, we turn to a classic statistics book *Problem Solving: A Statistician’s Guide *published in 1988:

4. A statistical procedure which is

not much affected by minor departuresis said to be robust and is fortunate that many procedures have this property. For example thet-test is robust to departure from normality. [5]

### (2) Dealing with Errors and Outliers

Now that we went through all kinds of definitions with few variations, let’s see **why we need robust statistics / robust model**.

While handling with outliers, we have a couple of approaches at hand:

#### 1. Treat outliers as errors, then remove them

A straightforward method often be adopt without full consideration. Here, we may remove the outliers, leaving the data points with missing values.

#### 2. Use domain knowledge and find outliers possible

It may be sensible to treat an outlier as a missing observation, but this may be improper if the distribution is heavy-tailed.

Extreme observations which may, or may not, be errors are more difficult to handle. The tests deciding which outliers are ‘significant’, but

they are less important thanadvice from people ‘in the field’as to which suspect values are obviously silly or impossible and should be viewed with caution. [5]

#### 3. Use robust methods

An alternative approach

is to use robust methods of estimation whichautomatically downweight extreme observations. For example one possibility of univariate data is touse Winsorization by which extreme observations are adjusted toward the overall mean, perhaps to the second or third most extreme value (either large or small as appropriate). However, many analysts prefer a diagnostic parameter approach which isolates unusual observations for further study. [5]

My recommended procedure for dealing with outlying observations, when there is no evidence that they are errors, is to

repeat the analysis with and without the suspect values. If the conclusions are similar, then the suspect values “don’t matter”. If the conclusions differ substantially, then one should be wary of making judgements which depend so crucially on just one or two observations (called influential observation). [5]

### (3) Another Instance

The assumptions of LDA (linear discriminant analysis) are that features are *independent, continuous, normally distributed*. If the preceding assumptions are violated, LDA performs badly; then,** in this case, regression is more robust than LDA**, and **neural network is more robust than regression**. [6]

After understanding this instance, we shall move on to a larger scope.

### (4) Parametric, Non-parametric and Robust Approaches [5]

Now we know that a model is only an approximation to reality. The model can be spoiled by:

(a) *Occasional gross errors*. (Gross errors are caused by e*xperimenter carelessness* or* equipment failure*. These “outliers” are so far above or below the true value that they are usually discarded when assessing data. The “Q-Test” is a systematic way to determine if a data point should be discarded. [7])

(b) *Departures from the secondary assumptions, i.e. distributional assumptions*, e.g. the data are not normal or are not independent.

(c) *Departures from the primary assumptions*.

“Traditional” statisticians usually get around (a) with diagnostic checks, where usual observations are isolated or ‘flagged’ for further study. This can be regarded as a step towards robustness.

Here are 3 approaches we can adopt to tackle with the issues above:

#### 1. Parametric approach

A classical parametric model-fitting approach comes first in our minds. There are 4 main assumption of parametric approach [8]:

{1} Normal distribution of data

{2} Homogeneity of variance

{3} Interval data

{4} Independence

#### 2. Robust approach

Robust methods may involve fitting a parametric model but employ procedures which **do not depend critically on the assumptions** implicit in the model. In particular, outlying observations are usually automatically downweighted.** Robust method can therefore be seen as lying somewhere in between classical and non-parametric methods**.

Some statisticians prefer a robust approach to most problems on the grounds that little is lost when no outliers are present, but much is gained if there are. Outliers may spoil the analysis completely, and thus some robust procedures may become routine.

#### 3. Non-parametric approach

A non-parametric (or distribution-free) approach **makes few assumptions about the distribution** of the data as possible. It’s widely used for analyzing social science data which are often not normally distributed, but rather** may be severely skewed**.

Non-parametric methods get around problem (b) above and perhaps (a) to some extent. Their attractions are that (by definition) they are valid under minimal assumptions and generally have satisfactory efficiency and robustness properties. Some of the methods are **tedious computationally** although this is not a problem with a computer available. However, **non-parametric results are not always so readily interpretable as those from a parametric analysis**. Non-parametric analysis should thus be reserved for special types of data, notably **ordinal data or data from a severely skewed or otherwise non-normal distribution.**

Let’s further probe into “Nonparametric Tests vs. Parametric Tests” [9], featuring the advantages of each other:

#### Advantages of Parametric Tests

1. Parametric tests can provide trustworthy results with distributions that are skewed and non-normal

2. Parametric tests can provide trustworthy results when the groups have different amounts of variability

3. Parametric tests have greater statistical power

#### Advantages of Nonparametric Tests

1. Nonparametric tests assess the median which can be better for some study areas

2. Nonparametric tests are valid when our sample size is small and your data are potentially non-normal

3. Nonparametric tests can analyze ordinal data, ranked data, and outliers

Initial data analysis may help indicate which approach to adopt. However, if still unsure, it may be

worth trying more than one method. If, for example, parametric and non-parametric tests both indicate that an effect is significant, then one can have confidence in the result. If, however, the conclusions differ, then more attention must be paid to the truth of secondary assumptions.

### (5) Another Robust Method: Resampling [5]

#### 1. Resampling

There are a number of estimation techniques which rely on *resampling* the observed data to assess the properties of a given estimator. They are useful for providing non-parametric estimators of the bias and standard error of the estimator **when its sampling distribution is difficult to find or when parametric assumptions are difficult to justify**.

*2. Jack-knifing*

The usual form of *jack-knifing* is an extension of resampling. Given a sample of n observations, the observations are dropped one at a time giving n (overlapping) groups of (n-1) observations. (cf. Leave-One-Out Cross-Validation, LOOCV) The estimator is calculated for each group and these values provide estimates of the bias and standard error of the overall estimator.

**3. Bootstrap**

A promising alternative way of re-using the sample is *bootstrapping*. The idea is to simulate the properties of a given estimator by taking repeated samples of size n **with replacement** from the observed empirical distribution in which X1, X2, …, Xn are each given probability mass 1/n. (cf. *jack-knifing *takes sample size (n-1) **without replacement**.) Each sample gives an estimate of the unknown population parameter.

The average of these values is called the *bootstrap estimator*, and their variance is called the *bootstrap variance*. A close relative of *jack-knifing*, called cross-validation (CV), is not primarily concerned with estimation, but rather with **assessing the prediction error of different models**. Leaving out one (or more) observations at a time (i.e. Leave-One-Out Cross-Validation, LOOCV), a model is fitted to the remaining points and used to predict the deleted points.

### (6) Reference

[1] The University of Adelaide (Unidentified). Robust Statistics. Retrieved from

Robust statistics

[2] Wikipedia (Unidentified). Robust statistics. Retrieved from

Robust statistics

[3] Kenton, W. (2020). Robust. Retrieved from

Robust

[4] Taylor, C. (2019). Robustness in Statistics. Retrieved from

Robustness: The Strength of Statistical Models

[5] Chatfield, C. (1988). *Problem Solving: A Statistician’s Guide*. London, UK: Chapman & Hall.

[6] Lewis, N.D.(2016). *Learning from Data Made Easy with R: A Gentle Introduction for Data Science*. [Place of publication not identified]: CreateSpace Independent Publishing Platform.

[7] University of California (Unidentified). Analysis of Errors. Retrieved from

http://faculty.sites.uci.edu/chem1l/files/2013/11/RDGerroranal.pdf

[8] Klopper, J.H. (Unidentified). Assumptions for parametric tests. Retrieved from

RPubs

[9] Frost, J. (Unidentified). Nonparametric Tests vs. Parametric Tests. Retrieved from

Nonparametric Tests vs. Parametric Tests – Statistics By Jim

### Don’t forget to give us your 👏 !

https://medium.com/media/c43026df6fee7cdb1aab8aaf916125ea/href

ML07: What is “robust” ? was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Credit: BecomingHuman By: Morton Kuo