And then defining a very simple model.
We will choose CrossEntropy as our loss function and accuracy as our metric.
We will also set the learning rate and number of epochs to get started.
We will now write the training loop from scratch.
However, this does not look very efficient. Especially, the part where we update our weights. Instead of having to go through every layer and updating its parameters, it’ll be nice if we can update all our parameters together. We want to be able to do something like this:
To do this, we want our model to store information about all of its layers, and then return it when we call
.parameters() on it. Let’s create a dummy model as follows:
In this model, we make use of Python’s special dunder method called
__setattr__. This method will be called every time we set an attribute, eg.
self.l1 = nn.Linear(n_in, nh) .
So now, every time we create a layer, we will enter this method and store information about the layer in a dictionary called
self.modules. We can then use this dictionary to generate all our parameters as shown.
This is what PyTorch does for us behind the scenes when we inherit from
nn.Module and this is why we have to call
We can then call
named_children() on our model to get our layers.
And we can use
model.parameters() to update all our parameters directly. Our training loop now looks like this.
If we create a list of layers, then we cannot access their parameters using
model.parameters by simply doing
self.layers = layers . We need to add every layer to our module as follows:
PyTorch does the same thing using
nn.ModuleList . So we can instead do:
However, this is still clunky so we can directly do:
We now know how
nn.Sequential is implemented.
Let’s further improve our code for weight update so that we can go from
To do this we will define an Optimizer class and put these functionalities inside it.
Our training loop looks much cleaner now.
Understanding datasets and dataloaders
The next bit we are going to improve is mini-batching. We are iterating through x and y mini-batches separately which is not good. Hence we will create a dataset and work on them together.
This modifies our loop as follows:
Finally we create a class for data loaders to further clean up the mini-batching process.
This works because
yield always returns the next mini-batch. Our final training loop is as easy to read as plain English.
That will be it for this article.
If you liked it, give it atleast 50 claps.
If you want to learn more about deep learning you can check out my deep learning series below.
Deep learning from the foundations, fast.ai