**Introduction and Motivation**

With the effect of the pandemic increasing every day and casting a vehemently toxic influence in almost all parts of the world, it becomes important how can we contain the spread of the disease. In an effort to combat the disease every country has increased not only their testing facility but also the amount of medical help and emergency and quarantine centers.

Here in this blog, we try to model Single-step Time Series Prediction using Deep learning Models on the basis of Medical Information available for different states of India.

Considering all these factors, it becomes important to have a predictive model that can predict the Number of Active Cases, Deaths, and Recoveries based on the change in Medical Facilities as well as other changes in infrastructure.

**Single-Step/Multi-Step Time Series Prediction**

One step time series prediction is a supervised machine learning task that comes with the functionality where the **previous n-values** are available when the next value in the time-series is predicted. In contrast, multi-step prediction involves prediction for **x future steps.**

The following figure depicts the different life cycle stages of time-series model training and prediction.

- Feeding
**Multi-variate data**from a single source or from aggregated sources available directly from the cloud or other 3rd-party providers into the ML modeling data ingestion system. - Cleaning, preprocessing, and feature engineering of the data involving
**scaling**and**normalization**. - Conversion of the data to a
**supervised time-series**. - Feeding the data to a deep learning training source that can train different time-series models like
**LSTM, CNN, BI-LSTM, CNN+LSTM**using different combinations of**hidden layers, neurons, batch-size, and other hyper-parameters.** - Forecasting based on
**near term**or**far distant term**in future either using**Single-Step or Multi-Step Forecasting respectively** - Evaluation of some of the error metrics like (
**MAPE, MAE, ME, RMSE, MPE**) by comparing it with the actual data, when it comes in - Re-training the
**model and continuous improvements**when the threshold of error exceeds.

**Loading and Selecting Features**

As Delhi had high Covid-19 cases, here we model different DL models for the **“DELHI” State (National Capital of India). **Further**,** we keep the scope of dates from 25th March to 6th June 2020. Data till 29th April has been used for Training, whereas from 30th April to 6th June has been used for testing/prediction.

*Here, we have selected features mostly related to the availability of medical facilities like hospitals, ICU beds, the number of testing facilities, and a number of cured/discharged/migrated/quarantined centers.*

*Here, we have selected features mostly related to the availability of medical facilities like hospitals, ICU beds, the number of testing facilities, and a number of cured/discharged/migrated/quarantined centers.*

stateName = unique_states[34]

dataset =list_state_all[34]

dataset = dataset.sort_values(by='Date', ascending=True)

dataset = dataset[(dataset['Date'] >= '2020-03-25') & (dataset['Date'] <= '2020-06-06')]daterange = dataset['Date'].values

no_Dates = len(daterange)dateStart = daterange[0]

dateEnd = daterange[no_Dates - 1]dataset = dataset[['Total Confirmed cases','Death',

'Cured/Discharged/Migrated', 'coronaenquirycalls',

'cumulativepeopleinquarantine', 'negative', 'numcallsstatehelpline',

'numicubeds', 'numisolationbeds', 'numventilators',

'populationncp2019projection', 'positive',

'testpositivityrate',

'testspermillion', 'testsperpositivecase', 'testsperthousand',

'totaln95masks', 'totalpeoplecurrentlyinquarantine',

'totalpeoplereleasedfromquarantine', 'totalppe', 'totaltested',

'unconfirmed', 'Active Cases']]

As we have 22 features in total, we ensure each of the input features are initially scaled and then are

to yield in 22 input features plus one output predicted outcome, i.e. Thetime-shifted by one unit (t+1) th output for t th inputNumber of Active Cases. The rest of the columns are dropped. The below code snippet explains that in detail.

Feature ScalingThis becomes very important given, as in this current problem scope the features vary in the range too much, (10 to 1000000)

#no_features = 22

no_features = np.shape(dataset)[1]-1

print("No of features", no_features)

values = dataset.values# ensure all data is float

values = values.astype('float32')

print(np.shape(values))

# normalize features

scaler = MinMaxScaler(feature_range=(0, 1))

scaled = scaler.fit_transform(values)reframed = series_to_supervised(scaled, 1, 1)

# drop columns we don't want to predict

print(np.shape(reframed))

Convert Time-Series to Supervided Dataset`This procedure is known as a one-step prediction in time series which uses lagged (one) observations (e.g. t-1) as input variables to forecast the current time step (t). This ensures all series are stationary with differencing and seasonal adjustment.`

# # convert series to supervised learning

def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):

n_vars = 1 if type(data) is list else data.shape[1]

df = pd.DataFrame(data)

cols, names = list(), list()

# input sequence (t-n, ... t-1)

for i in range(n_in, 0, -1):

cols.append(df.shift(i))

names += [('var%d(t-%d)' % (j + 1, i)) for j in range(n_vars)]

# forecast sequence (t, t+1, ... t+n)

for i in range(0, n_out):

cols.append(df.shift(-i))

if i == 0:

names += [('var%d(t)' % (j + 1)) for j in range(n_vars)]

else:

names += [('var%d(t+%d)' % (j + 1, i)) for j in range(n_vars)]

# put it all together

agg = pd.concat(cols, axis=1)

agg.columns = names

# drop rows with NaN values

if dropnan:

agg.dropna(inplace=True)

return aggAfter the redundant/un-necessary columns are dropped (24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45) the entire dataset is split into training and testing dataset in the

ratio of 60%:40%,and then we apply different deep learning techniques.As we train only on the basis of 22 features and predict one output, columns starting from

24 to 45 are dropped.reframed.drop(reframed.columns[[24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45]], axis=1, inplace=True)

# split into train and test sets

values = reframed.values

split_factor = int(dataset.shape[0]*0.6)

print(split_factor)

train = values[:split_factor, :]

test = values[split_factor:, :]print(np.shape(train))

print(np.shape(test))# split into input and outputs

train_X, train_y = train[:, :-1], train[:, -1]

test_X, test_y = test[:, :-1], test[:, -1]print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

# reshape input to be 3D [samples, timesteps, features]

train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))

test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))

print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)print(train_X.shape[1], train_X.shape[2])

The figure typically depicts a

multi-layered stacked LSTM based Neural Network.The following code snippet demonstrates how we train an

LSTM model, plot the training and validation loss,before making a prediction.Training vs Validation LossThis code snippet shows a mechanism to compute the error metrics and inverse scale the predicted outcome.

# design Stacked LSTM networks

model = Sequential()model.add(LSTM(units=50, return_sequences= True, input_shape=(train_X.shape[1], train_X.shape[2])))

model.add(LSTM(units=50, return_sequences=True))

model.add(LSTM(units=50))

model.add(Dense(units=1))

model.compile(loss='mae', optimizer='adam')# fit network

history = model.fit(train_X, train_y, epochs=1500, batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)

# plot history

plt.figure(figsize=(14,12))

plt.plot(history.history['loss'], label='train')

plt.plot(history.history['val_loss'], label='test')

plt.legend()

plt.show()# make a prediction

y_predict = model.predict(test_X)

test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))

print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)LSTM Model Prediction

Bi-Directional LSTMAs we know

LSTM (Uni-directional)preserves information from inputs to the outputs that have already passed through it using the hidden state.On the contrary, bidirectional will run inputs in two ways, one from past to future and one from future to past. This kind of LSTM that runs backward to preserve information from the

futureand using the two hidden states combined, it is able in any point in time to preserve information fromboth past and futureThe following code snippet demonstrates how we train a

Bi-LSTM model, plot the training and validation loss,before making a prediction.train = values[:split_factor, :]

test = values[split_factor:, :]print(np.shape(train))

print(np.shape(test))# split into input and outputs

train_X, train_y = train[:, :-1], train[:, -1]

test_X, test_y = test[:, :-1], test[:, -1]print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

# reshape input to be 3D [samples, timesteps, features]

train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))

test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))

print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)print(train_X.shape[1], train_X.shape[2])

# design Stacked LSTM networks/Bi-directional LSTM networks

model = Sequential()

model.add(Bidirectional(LSTM(50, activation='relu'), input_shape=(train_X.shape[1], train_X.shape[2])))

model.add(Dense(1))model.compile(loss='mae', optimizer='adam')

# fit network

history = model.fit(train_X, train_y, epochs=1500, batch_size=72, validation_data=(test_X, test_y), verbose=2, shuffle=False)The below figure illustrates theActual vs Predicted Outcome of Bi-LSTM model, after the predicted outcome has been inverse-transformed (to remove the effect of scaling).BI-LSTM Model Prediction

CNN (Convolution Neural Network)We also used CNN for evaluating the model performance for single-step time-series prediction.

The following code snippet demonstrates how we train a

CNN model, plot the training and validation loss,before making a prediction.train_X, train_y = train[:, :-1], train[:, -1]

test_X, test_y = test[:, :-1], test[:, -1]print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

# reshape input to be 3D [samples, timesteps, features]

train_X = train_X.reshape((train_X.shape[0], train_X.shape[1], 1))

test_X = test_X.reshape((test_X.shape[0], test_X.shape[1], 1))

print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)#CNN

model = Sequential()

model.add(Conv1D(filters=64, kernel_size=2, activation='relu', input_shape=(train_X.shape[1], train_X.shape[2])))

model.add(MaxPooling1D(pool_size=2))

model.add(Conv1D(filters=64, kernel_size=2, activation='relu', input_shape=(train_X.shape[1], train_X.shape[2])))

model.add(MaxPooling1D(pool_size=2))

model.add(Flatten())

model.add(Dense(100, activation='relu'))

model.add(Dense(1))

model.compile(loss='mse', optimizer='adam')

model.summary()#fit model

history =model.fit(train_X, train_y, epochs=1500, batch_size=72, validation_data=(test_X, test_y), verbose=2,shuffle=False)

Trained Model for Prediction# make a prediction

y_predict = model.predict(test_X)

test_X = test_X.reshape((test_X.shape[0], test_X.shape[1]))

print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

Inverse Transofrm Predictions and Computation of Errorinv_y_predict = concatenate((y_predict, test_X[:, -(no_features):]), axis=1)

inv_y_predict = scaler.inverse_transform(inv_y_predict)

inv_y_predict = inv_y_predict[:, 0]

# invert scaling for actual

test_y = test_y.reshape((len(test_y), 1))

inv_y = concatenate((test_y, test_X[:, 1:]), axis=1)

inv_y = scaler.inverse_transform(inv_y)

inv_y = inv_y[:, 0]

# calculate RMSE

rmse = np.sqrt(mean_squared_error(inv_y, inv_y_predict))

print('Test RMSE: %.3f' % rmse)pred_len = len(inv_y_predict)

print(pred_len)

dateEnd = daterange[split_factor+1]

print(dateEnd)

pred_index= pd.date_range(start=dateEnd, periods=pred_len, freq='D')

#print(pred_index)inv_y_actual = pd.Series(inv_y, pred_index)

inv_y_predicted = pd.Series(inv_y_predict, pred_index)

The below figure illustrates theActual vs Predicted Outcome of CNN modelafter the predicted outcome has been inverse -transformed (to remove the effect of scaling).CNN Model Prediction

CNN + LSTMHere we have used

Conv1d with TimeDistributed Layer,which is then fed to asingle layer of LSTM, to predicted different sequences, as illustrated by the figure below.The CNN model is built first, where each layer in the CNN model is wrapped in a TimeDistributed layer, and then added to the LSTM model.

However, the other alternative approach could be used to construct the CNN model first, then add it to the LSTM model by wrapping the entire sequence of CNN layers in a TimeDistributed layer.

TimeDistributed Layer is primarily used to present several sets of data (say sequences/mages) that are chronologically ordered to detect trends/ movements, actions, directions.

# split into input and outputs

train_X, train_y = train[:, :-1], train[:, -1]

test_X, test_y = test[:, :-1], test[:, -1]

print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)#LSTM + CNN

subsequences = 1

timesteps = train_X.shape[1]

X_train_series_sub = train_X.reshape((train_X.shape[0], subsequences, timesteps, 1))

X_valid_series_sub = test_X.reshape((test_X.shape[0], subsequences, timesteps, 1))

print('Train set shape', X_train_series_sub.shape)

print('Validation set shape', X_valid_series_sub.shape)model = Sequential()

model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation='relu'), input_shape=(None, X_train_series_sub.shape[2], X_valid_series_sub.shape[3])))

model.add(TimeDistributed(MaxPooling1D(pool_size=2)))

model.add(TimeDistributed(Flatten()))

model.add(LSTM(50, activation='relu'))

model.add(Dense(1))

model.compile(loss='mse', optimizer='adam')history = model.fit(X_train_series_sub, train_y, validation_data=(X_valid_series_sub, test_y), epochs=1500, verbose=2)

The prediction and inverse scaling help to yield the actual predicted outcomes.

#Prediction (LSTM + CNN)

yhat = model.predict(X_valid_series_sub)

print(yhat)

test_X = X_valid_series_sub.reshape((X_valid_series_sub.shape[0], X_valid_series_sub.shape[2]))inv_y_predict = concatenate((y_predict, test_X[:, -(no_features):]), axis=1)

inv_y_predict = scaler.inverse_transform(inv_y_predict)

inv_y_predict = inv_y_predict[:, 0]# invert scaling for actual

test_y = test_y.reshape((len(test_y), 1))

inv_y = concatenate((test_y, test_X[:, 1:]), axis=1)

inv_y = scaler.inverse_transform(inv_y)

inv_y = inv_y[:, 0]# calculate RMSE

rmse = np.sqrt(mean_squared_error(inv_y, inv_y_predict))

print('Test RMSE: %.3f' % rmse)pred_len = len(inv_y_predict)

print(pred_len)

dateEnd = daterange[split_factor+1]

print(dateEnd)

pred_index= pd.date_range(start=dateEnd, periods=pred_len, freq='D')

#print(pred_index)inv_y_actual = pd.Series(inv_y, pred_index)

inv_y_predicted = pd.Series(inv_y_predict, pred_index)

The below figure illustrates theActual vs Predicted Outcome of stacked LSTM and CNN modelafter the predicted outcome has been inverse -transformed (to remove the effect of scaling).LSTM with CNN

Training and Validation LossEpoch 1494/1500

58/58 - 0s - loss: 3.2615e-06 - val_loss: 0.0056

Epoch 1495/1500

58/58 - 0s - loss: 3.3479e-06 - val_loss: 0.0056

Epoch 1496/1500

58/58 - 0s - loss: 3.3705e-06 - val_loss: 0.0053

Epoch 1497/1500

58/58 - 0s - loss: 3.2291e-06 - val_loss: 0.0054

Epoch 1498/1500

58/58 - 0s - loss: 3.0793e-06 - val_loss: 0.0056

Epoch 1499/1500

58/58 - 0s - loss: 3.8484e-06 - val_loss: 0.0055

Epoch 1500/1500

58/58 - 0s - loss: 3.8213e-06 - val_loss: 0.0054

Train vs Validation LossThe following table depicts the computed RMSE metrics for each of the deep learning models.

Deep Learning MethodRMSELSTM 5262.208 BI-LSTM 804.197 Stacked LSTM 2730.476 CNN 8634.9 LSTM + CNN 8634.9 Error Metrics of Deep Learning-based Models

ConclusionHere we see

bi-directional LSTM works the best, followed bymultiple stacked layers of LSTM and single LSTM layer. This is just a basic study and results might differ based on the dataset. In the next blog (series 2 ) we will see different multi-step prediction results.More extensive hyper-parameter tuning is needed along with

dynamic datafeaturing achange in medical facilities and supplies.For complete source code check out https://github.com/sharmi1206/covid-19-analysis

AcknowledgmentsSpecial thanks to machinelearningmastery.com. as some of the concepts have been taken from there.

References