Before start my main topic, I would like to introduce you about Regression Analysis and Time Series Data in shortly.

Regression analysis is a statistical techniques in machine learning, which is most popular and frequently used techniques. This techniques is useful for investigating and modelling the relationship between dependent feature/variable (y) and one or more independent features/variables (x)

In simple word, time series data is data such that its points are recorded at time sequence. In other word, data is collected at different point in time.

Example : Annual Expenditures of particular person.

Hope, you may have understood what is regression analysis and time series data. Let’s come to the point.

Many applications of regression analysis involve both independent/predictor and dependent/response variables that are time series, that mean, the variables are recorded at time sequence. The assumption of uncorrelated or independent errors that is typically made for regression data that is not time-dependent is usually not appropriate for time series data. The error in time series data represent autocorrelated structure. Autocorrelation, also known as serial correlation, tell that the error are correlated with different time period itself.

There are many sources of auto-correlation in time series regression data. In many cases, the cause of autocorrelation is the failure of the analyst to include one or more important predictor variable in the model.

Ex : Suppose that we wish to regress the sales of a product in a particular region of the country against the annual advertising expenditures for that product.

In above example, the growth in the population in that region over the period of time used in the study will also influence the product sale. Failure to include the population size may cause the errors in the model to be positively autocorrelated, because if the per-capita demand for the product is either constant or increasing with time, population size is positively correlated with product sales.

The presence of autocorrelation in the errors has several effect on the ordinary least-squares regression procedure.

- Regression coefficient are still unbiased, but they are no longer minimum- variance estimates.
- When the errors are positively autocorrelated, the residual mean square may seriously underestimate the error variance.
- Confidence intervals, prediction intervals, and tests of hypotheses base on t and F distributions are, strictly speaking, no longer exact procedures.

We can deal with autocorrelation using three approaches. If autocorrelation present due to failure of to include one or more predictors and if analyst can be identified and include those predictor in the model, then observed autocorrelation should disappear.

As another option to dealing with the problem of autocorrelation, the weighted least squares or generalised least squared method could be used if there were sufficient knowledge of the autocorrelation structures. If these approaches cannot be used then the analyst must turn to a model that specifically include the autocorrelation structure. These models usually require special parameter estimation techniques. How can we identify autocorrelaion present or not in your data? This most very common question arise for every analyst. So that is what, I am going to discuss about how we can detect autocorrelation using statistical techniques with example.

1. Natural Language Generation:

The Commercial State of the Art in 20202. This Entire Article Was Written by Open AI’s GPT2

3. Learning To Classify Images Without Labels

4. Becoming a Data Scientist, Data Analyst, Financial Analyst and Research Analyst

For the detection of autocorrelation, residual plots can be useful. Draw the plot of residuals versus time for meaningful and useful visualisation.

There are two possibility while detecting autocorrelation.

**Positive autocorrelation** : Positive autocorrelation is indicated by a cyclical residual plot over time. The correlation is positive between observation which were recorded in time sequence.

**Negative autocorrelation** : Negative autocorrelation is indicated by alternating pattern where the residual cross time axis more frequently than if they were distributed randomly. The correlation is negative between observation which were recorded in time sequence.

See the below figure, which is visualisation of autocorrelation. Figure is showing the relation between residuals(Y-axis) and time(X-axis).

Various statistical tests can be used to detect the presence of autocorrelation. The test developed by Durbin and Watson (1950, 1951, 1971) is a very widely used procedure. This test for first order autocorrelation — i.e. assume that the errors in the regression model are generated by a first-order autoregressive process observed at equally spaced time period.