Then, the historical data of the Australian labour market 2019–20 was explored to confirm the above hypotheses. From Graph 4 & 5, the population aged 15 years and over in 2018–20 was not be affected much. However, the AU labour force was dropped dramatically while the unemployment rate rose rocket. Hence, the latter case occurred. As there are two potential sub-cases in the second hypothesis, the data of not in labour market 2018–20 and the population aged 15 years and over 2018–20 were evaluated, exploring the insight. According to Table 2, the main reason for not joining the labour force is that people did not actively look for work, and the heavily shifting upward trend of the line was also caused by the 82% increasing in that category (Australian Bureau of Statistics. 2020). Hence, the clock down has affected Australian sentiment and job search demand.
Second, for the causes by real-wage unemployment, the possibility that the decision to increase 1.75% wage implemented in July 2020 (Fair Work Ombudsman, 2020) impacted the unemployment rate. In the normal condition, the lifting of the wage-setting curve can positively assist the labour market by shifting E to E1 and U to U1. However, there would be the case that some companies could not bear the new wage-setting, leading to shifting of quantity from Q1 to Q2, shifting labour supply line towards L1, and moving U to U1 (Graph 6, and 7, respectively). As this order will be fully effective in February 2021, and there will be a certain delay for the cause by real-wage unemployment, more data need to be collected and explored in the later stage.
Lastly, for the causes by the demand side, when the long-run aggregate supply line (LRAS) was shifted backward, the wage-setting curve was affected, shifting it to the left, and the demand-deficient issue is recorded (Graph 8 & 9). According to Graph 10 & 11, there is clear evidence that demand-deficient unemployment would be the case.
In summary, three scenarios have been reviewed to provide an overview of potentially related elements for choosing and analyzing. For the forecasting, time-series analysis and regression analysis are techniques chosen to conduct the predictive analysis.
Part 2. Time-series forecast: Random Walk, Simple Exponential Smoothing, Holt’trend, and ARIMA (Autoregressive Integrated Moving Average)
predictive value = intercept + lagged values + lagged error
It is necessary to evaluate whether the dataset is white-noise to conduct the upcoming step. The White-noise criteria include (1) the mean equals 0, (2) the standard deviation is constant, and (3) the correlation between lag is 0.
First, from Graph 12, the mean of the dataset was not equal. Besides, the standard deviation of the dataset was not constant. The “acf” function from the forecast package in R was used to test the correlation between lag and recorded not-equal-0 result (Graph 13). Notice that as the used autocorrelation function is obtained from forecast package instead of stats package, the lag showed in the horizontal axes show lags in time units (monthly) rather than seasonal units.
Furthermore, the Augmented Dickey-Fuller Test was conducted to review whether the dataset is stationary. The obtained result indicates that the time-series is not stationary (Graph 14). Details information about the null hypothesis testing can be found in the github file. The first difference transformation process is applied to treat the dataset as stationary for autoregression integrated moving average (Arima) analysis in the later stage.
Then, the dataset was split into train & test dataset. Train dataset is contained data collected from February 1978 to June 2020, and the test dataset is contained data collected from July 2020 to November 2020. Four models were implemented for forecasting, including (1) Random Walk Forecast, (2) Simple Exponential Smoothing (SES), (3) Holt’s trend, and (4) Seasonal Autoregressive Integrated Moving Average — SARIMA(2,0,2)(0,0,2)(12). The (2) model was out-perform others with a mean absolute percentage error (MAPE) equal to 1.438719% (Table 4). Details explanation about the model building methods can be found in the github file.
The four-month prediction was conducted and the results were plotted on Graph 15. White the Holt’s trend method provides that the unemployment rate will be increased continuously, the SARIMA(2,0,2)(0,0,2)(12) model illustrates that the unemployment rate could be decreased, reaching 6.8% in March 2021 (Table 5).
Graph 15: Forecast Monthly Unemployment Rate Dec 2020 — Mar 2021 & Table 5: Forecast in details
Part 3. Study the influencing factors by using machine learning technique: Multiple Linear Regression
From Graph 16, the variables that have strong statistical relationship with unemployment rate are GDPchange, GDPprice, Wage.Victoria, Wage.total, Wage.alter, GDPindex, Domesticdemand, Wage.WAustralia. However, there would be a certain level of multicollinearity. Hence, the multicollinearity was diagnosed by using vif function from car package in R. Criteria to select variables and building model include (1) p-value < 0.05, (2) VIF < 10, and (3) R-squared >= 0.3. According to Moore, Notz, and Flinger (2013), r-square lower than 0.3 indicates a none or very weak effect size, and, according to James (2014), VIF value above 5 or 10 should be removed from the model.
The dataset was split into train & test dataset with the ratio 80:20, and details information related to the hypothesis testing, variables selection process, and though process can be found on the github’s file.
Quarterly Unemployment Rate (%) = 6.16691 + (-0.92121) * [Domestic final demand: Index - Percentage change] + (-0.17724) * [Wages quarterly percentage change ; Total (State); Total (Industry)]
Standard deviation of residuals: 0.7092026
Three cases are created to forecast the unemployment in 1Q2021 with the assumption stated in table 6.
The shortage of this regression model is that r-squared (0.3794) is lower than 0.5. R-squared can be improved when keep adding new variables. To ensure that the added variables are useful, adjusted R-squared should also be considered when evaluating the model’s performance. The decision to use quarterly datasets from different resources was made due to the shortage of data. Therefore, it is also a risk that the data collection, pre-processing, and terminology definition might be different between inputs. Consequently, potential optimization methods are: (1) collect real-time data & adding real-time data in a shorter period such as monthly instead of quarterly, and (2) increase the complexity by including variables such as the government expense on Job Keeper package and other supporting attempts to recover the economy.
Australian Bureau of Statistics. 2020. Reasons people are not in the labour force. [online] Available at: <https://www.abs.gov.au/articles/reasons-people-are-not-labour-force> [Accessed 3 February 2021].
Fair Work Ombudsman. 2020. Welcome to the Fair Work Ombudsman website. [online] Available at: <https://www.fairwork.gov.au/about-us/news-and-media-releases/website-news/the-commission-has-announced-a-1-75-increase-to-minimum-wages> [Accessed 3 February 2021].
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani, 2014. An Introduction to Statistical Learning: With Applications in R
Moore, D.S., Notz, W.I, & Flinger, M.A, 2013. The basic practice of statistics (6th ed.). New York, NY: W.H. Freeman and Company. Page 138