Artificial neural networks are statistical learning models, inspired by biological neural networks (central nervous systems, such as the brain), that are used in machine learning. These networks are represented as systems of interconnected “neurons”, which send messages to each other. The connections within the network can be systematically adjusted based on inputs and outputs, making them ideal for supervised learning.
Neural networks can be intimidating, especially for people with little experience in machine learning and cognitive science! However, through code, this tutorial will explain how neural networks operate. By the end, you will know how to build your own flexible, learning network, similar to Mind.
The only prerequisites are having a basic understanding of JavaScript, high-school Calculus, and simple matrix operations. Other than that, you don’t need to know anything. Have fun!
A neural network is a collection of “neurons” with “synapses” connecting them. The collection is organized into three main parts: the input layer, the hidden layer, and the output layer. Note that you can have n hidden layers, with the term “deep” learning implying multiple hidden layers.
Hidden layers are necessary when the neural network has to make sense of something really complicated, contextual, or non obvious, like image recognition. The term “deep” learning came from having many hidden layers. These layers are known as “hidden”, since they are not visible as a network output. Read more about hidden layers here and here.
The circles represent neurons and lines represent synapses. Synapses take the input and multiply it by a “weight” (the “strength” of the input in determining the output). Neurons add the outputs from all synapses and apply an activation function.
Training a neural network basically means calibrating all of the “weights” by repeating two key steps, forward propagation and back propagation.
Since neural networks are great for regression, the best input data are numbers (as opposed to discrete values, like colors or movie genres, whose data is better for statistical classification models). The output data will be a number within a range like 0 and 1 (this ultimately depends on the activation function — more on this below).
In forward propagation, we apply a set of weights to the input data and calculate an output. For the first forward propagation, the set of weights is selected randomly.
In back propagation, we measure the margin of error of the output and adjust the weights accordingly to decrease the error.
Next, we’ll walk through a simple example of training a neural network to function as an “Exclusive or” (“XOR”) operation to illustrate each step in the training process.
Note that all calculations will show figures truncated to the thousandths place.
The XOR function can be represented by the mapping of the below inputs and outputs, which we’ll use as training data. It should provide a correct output given any input acceptable by the XOR function.
input | output
--------------
0, 0 | 0
0, 1 | 1
1, 0 | 1
1, 1 | 0
Let’s use the last row from the above table, (1, 1) => 0
, to demonstrate forward propagation:
We now assign weights to all of the synapses. Note that these weights are selected randomly (based on Gaussian distribution) since it is the first time we’re forward propagating. The initial weights will be between 0 and 1, but note that the final weights don’t need to be.
We sum the product of the inputs with their corresponding set of weights to arrive at the first values for the hidden layer. You can think of the weights as measures of influence the input nodes have on the output.
1 * 0.8 + 1 * 0.2 = 1
1 * 0.4 + 1 * 0.9 = 1.3
1 * 0.3 + 1 * 0.5 = 0.8
To get the final value, we apply the activation function to the hidden layer sums. The purpose of the activation function is to transform the input signal into an output signal and are necessary for neural networks to model complex non-linear patterns that simpler models might miss.
There are many types of activation functions — linear, sigmoid, hyperbolic tangent, even step-wise. To be honest, I don’t know why one function is better than another.
For our example, let’s use the sigmoid function for activation. The sigmoid function looks like this, graphically:
And applying S(x) to the three hidden layer sums, we get:
S(1.0) = 0.73105857863
S(1.3) = 0.78583498304
S(0.8) = 0.68997448112
We add that to our neural network as hidden layer results:
Then, we sum the product of the hidden layer results with the second set of weights (also determined at random the first time around) to determine the output sum.
0.73 * 0.3 + 0.79 * 0.5 + 0.69 * 0.9 = 1.235
..finally we apply the activation function to get the final output result.
S(1.235) = 0.7746924929149283
This is our full diagram:
Because we used a random set of initial weights, the value of the output neuron is off the mark; in this case by +0.77 (since the target is 0). If we stopped here, this set of weights would be a great neural network for inaccurately representing the XOR operation.
Let’s fix that by using back propagation to adjust the weights to improve the network in the next …..
Credit: BecomingHuman By: Meghashyam Thiruveedula