In this series, we will cover the concept of a neural network, the math of a neural network, the types of popular neural networks and their architecture.
Firstly we need to understand what is a neural network. In order to do that we will start from an example of a real-life problem and its solution using neural network logic.
Suppose you are in your room writing code and your 5 years old cousin comes to you and shows you his painting.
Trending AI Articles:
1. Ten trends of Artificial Intelligence (AI) in 2019
2. Bursting the Jargon bubbles — Deep Learning
3. How Can We Improve the Quality of Our Data?
4. Machine Learning using Logistic Regression in Python with Code
Now you look at it and you see something like that:
And the question that immediately comes to your mind is:
WHAT THE HECK IS THAT?
But you are smart! So what you can do is come up with a hypothesis “this is a cat”. But how can you know? You can write down all the features of the creature that you see. It would be logical that all of these features should affect the hypothesis, right?
But how can you be sure that all these features are actually what you think they are? How can you know that what you see as a wing really is one? Well you can always ask, but since that would be too easy and not very machine-learningly, what you can do is assign the probability to the things that you see. You can be quite sure there are 2 eyes, but with 4 legs, not so much, one seems to be deformed so the probability of the creature having 4 legs should be lower. With the tail you can’t be quite sure if it was really meant to be a tail given it’s color and cloudy look…let’s not get any deeper into that and present the full list of features with their probabilities.
Then again you have to ask yourself, are all those features equally important? In fact cat is still a cat if he loses one eye. On the other hand however the creature that has wings is most deffinitelly not a cat, except for maybe some genetic mutations caused by living next to a nuclear power plant that has been polluting the local area for 20 years and bribed the inspectors to look the other way, while they dumped their waste into the local animal shelter. So our next goal is to decide how important is each of those features in making the final decision and if they are increasing or decreasing the probability of the thesis.
Now that we decided how important the features are and had them properly scaled and we can simply add them together and get some result.
Y = 0.83 × (-0.87) + 0.95 × 0.42 + 0.66 × 0.69+ 0.72 × (-0.64) + 0.5 ×0.35 = -0.1535
Note that we write 0.83 instead of 83% but the value is the same. Also we add positive features and subtract negative features multiplied by their weight. We can see that the output is negative so we can conclude that this is probably not a cat but there is one last thing we must do. There might be a situation where we have lots of features and our output score is really large. We would like to have data in a format that is restricted to some interval, for example <0,1>. We can think of it as a probability between 0–100%. Why do we need this? Firstly let’s say we want to predict something different based on the fact that there is a cat, then the output of our function will become a feature in some other classification so it must be in the same range as all the other features . Secondly we can’t have linear functions in a deep neural network but that’s a spoiler for further articles, so let’s not talk about it right.
What function should we use to normalize the output? Well there are quite a lot to choose from but for now let’s use the sigmoid function.
It returns the output in the range <0,1>, so we can think about the output as a probability between 0–100%. Now we can write our formula in this format:
Y = sigmoid (0.83 × (-0.87) + 0.95 × 0.42 + 0.66 × 0.69+ 0.72 × (-0.64)+ 0.5 ×0.35)
Congratulations, you have a neuron!
So what does a neuron do?
- It takes the inputs and multiplies them by their weights,
- then it sums them up,
- after that it applies the activation function to the sum.
How does a neuron learn?
The neuron’s goal is to adjust the weights based on lots of examples of inputs and outputs. So let’s say we show the neuron a thousand of examples of drawings of cats and drawings of not cats and for each of those examples we show what features are present and how strongly we are sure they are here. Bases on these thousand of images the neuron decides:
- which features are important and positive (for example every drawing of cat had a tail in it so the weight must be positive and large),
- which features are not important ( for example only a few drawings had 2 eyes, so the weight must be small),
- which features are important and negative (for example every drawing containing a horn has been in fact a drawing of a unicorn not a cat so the weight must be large and negative).
We will explain the math of learning in the next article, for now all we need to know is that the neuron learns the weights based on the inputs and the desired outputs.
As we said before our neuron can be an input of some other neuron. Also the input of our neuron can be an output of some other neuron.
For example we want to predict if there are features like horns, legs, etc. based on the pixels of the photo and then based on the fact that it is a drawing of a cat we want to predict what types of animals the kid likes.
For that the activation function is especially useful because now we have normalized the output of all the neurons, so they can serve as inputs in the next layer of the network.
Note that usually the output of each neuron is not connected to a single neuron from the next layer but to many if not all the neurons. This is because of the fact that the information which the neuron predicted can be useful for many other neurons, so why wouldn’t we want to use it?
The black box
In this example we showed that the neuron decides if the image contains a cat based on specific features. In real life we usually don’t know which features have been used to predict the final output. In our example we stated that this is not a cat based on the features like a horn. But in real life the neural network can choose features like pixels in certain position and the process them through many, many classifications that we don’t know nothing about and based on that predict the the final output. This is called the black box approach because we only see the input and output not all the computations done in the middle. Because of that the layers that are between the input and output layers are called the hidden layers.
So what is a neural network?
I think we are ready for the final definition of the neural network.
Neural network is a set of neurons organized in layers. Each neuron is a mathematical operation that takes it’s input, multiplies it by it’s weights and then passes the sum through the activation function to the other neurons. Neural network is learning how to classify an input through adjusting it’s weights based on previous examples.