The goal of the Model Training process is to produce a trained model that you can later use to predict. We want to be able to give the model a set of input features, X, and have it predict the value of some output feature, y.
It is important to establish the problem to be solved e.g a classification or regression problem. The framing of the problem will influence both the choice of algorithms in the training process as well as the various approaches to take to get to the desired result.
The important prerequisite is to understand, transform data, create new features, selecting features that are most relevant to the training process. Once we have both problem time and training features defined, next steps are:
- Decide whether to scale or encode your data
- Splitting data (i.e. Training, Validation, and Test dataset)
Parameters and Hyperparameters:
When we train a model, a large part of the process involves learning the values of the parameters of the model. For example, earlier we looked at the general form for linear regression:
y = B0 + B1*x1 + B2*x2 + B3*x3 … + Bn*xn
The coefficients in this equation, B_0 … B_nB0…Bn, determine the intercept and slope of the regression line. When training a linear regression model, we use the training data to figure out what the value of these parameters should be. Thus, we can say that a major goal of model training is to learn the values of the model parameters.
In contrast, some model parameters are not learned from the data. These are called hyperparameters and their values are set before training. Here are some examples of hyperparameters:
- The number of layers in a deep neural network
- The number of clusters (such as in a k-means clustering algorithm)
- The learning rate of the model
We must choose some values for these hyperparameters, but we do not necessarily know what the best values will be prior to training. Because of this, a common approach is to make the best guess, train the model, and then tune adjust or tune the hyperparameters based on the model’s performance.
Splitting the Data:
As mentioned in the video, we typically want to split our data into three parts:
- Training data
- Validation data
- Test data
We use the training data to learn the values for the parameters. Then, we check the model’s performance on the validation data and tune the hyperparameters until the model performs well with the validation data. We can adjust this hyperparameter and then test the model on the validation data once again to see if its performance has improved.
Finally, once we believe we have our finished model (with both parameters and hyperparameters optimized), we will want to do a final check of its performance — and we need to do this on some fresh test data that we did not use during the training process.