Hello World! In this post, I’ll unfold each and every detail of a CNN architecture from the very basics. The only prerequisite for this post is just the basic knowledge of Linear Algebra and some familiarity with Python.
How Image is represented in a computer?
Computers don’t see the images like us, the humans. As we all know they only understand 0 and 1. Let’s see how an image is represented in a computer.
A colorful image has 3 channels- Red, Green, and Blue. The image is broken down into tiny elements called pixels. A pixel (short for picture element) represents one color. An image with a resolution of 1024 by 798 pixels has 1024 x 798 pixels (817,152 pixels).
Each value in the matrix denotes the intensity.
In short, an image can be represented as a matrix, where each value in the matrix denotes the intensity.
Trending AI Articles:
1. Face recognition: realtime masks development
2. Deep learning for sensor-based human activity recognition
3. How to build a deep learning server based on Docker
4. Transfer Learning: retraining Inception V3 for custom image classification
What is Convolution and how it works?
‘f *g’ denotes ‘f’ convoluted with ‘g’. This convolution is different from what we have studied in schools. ( A lot easier than that :p )
Hope that the below pictures will help you understand the technique.
The filter selects each unshaded region and performs corresponding multiplications then the sum of all elements is calculated and stored in the output matrix.
In Pic 1, the leftmost matrix represents the input image (say size = n x n) and it is convoluted with a filter (say size = f x f). We get an image matrix of size (n – f+ 1, n – f+ 1).
As we can confirm the size of the input image is 7 x 7 and the filter size is 3 x 3. After convolution, we obtain an image of size (7–3 + 1, 7–3 + 1) = 5 x 5.
Are we ready to create and train our first model?
Well, Yes! We are. But first, let’s see the whole picture of a simple CNN architecture.
The input image (6, 6, 4) has been convoluted with filters of size (3, 3, 3). There are 4 filters, so we get 4 filtered or convoluted image matrix of size (4, 4, 4).
You can check the dimension by calculating (n-f+1).
Now, let’s create a digit classifier. We will use Keras to create our model in less than 10 lines of code.
from keras.layers import conv2D, Dense
from keras import Sequential
model = Sequential()
model.add(Conv2D(64, kernel_size=3, activation=’relu’, input_shape=(28,28,1)))
model.add(Conv2D(32, kernel_size=3, activation=’relu’))
Next, we compile and train our model.
#compile model using accuracy to measure model performance
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
#train the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=3)
Keras has an easy to understand documentation, you should checkout their site.
Credit: Source link