This article explains a very brief overview of the perceptron algorithm
A perceptron was introduced by Frank Rosenblatt in 1957. He proposed the idea of a perceptron algorithm used for binary classification of data. Perceptron is a mathematical computational model for classifying only linearly separable data. Perceptron algorithm only works if the data is linearly separable.
It works by finding a hyperplane that splits the data such that all similar data points lie on either side of the hyperplane.
In theory there exists infinitely many such hyperplanes. The Perceptron algorithm might not necessarily find the optimal hyperplane. It stops as soon as all the data points are correctly separated.
A Perceptron takes the weighted sum of inputs and if the sum is more than 0 it outputs 1 and -1 if sum is less than 0.
But what happens if the point line on the hyperplane ie the weighted sum is equal to 0.
This point is still consider a misclassification. If the algorithm encounter any point that lies on the hyperplane it slightly nudge the parameters such that the point falls on either side of the hyperplane.
Let X and W be two vectors of inputs and weights respectively.
The perceptron takes w, x as input and gives a scalar output y which is passed through a threshold function defined as:
We use bias b to give the hyperplane more freedom. If the bias is not used the hyperplane will always pass through the origin.
We treat bias as like other weights where the input is 1.
Now that we know what the w is supposed to do (define a hyperplane that separates the data), let’s look at how we can get such w.
**Perceptron can take both real values and boolean values as inputs for the classification.
The algorithm starts with all the weights initialised to 0 ie w = 0. The algorithms loops and goes through each data point and checks the following:
If the points is correctly classified. It not the algorithm adjusts the weights based on the following rule :
The only time we say a point is misclassified is when these inequalities do not hold:
If the data point is -ve but it is in the positive half space, the equation becomes:
Thus the inequality breaks so the algorithm makes adjustment to the weights by using eq. 1
For any binary classification problem, set S is linearly separable if and only if there exists a hyperplane that separates these points in S.
High dimensions data points can be much far from each other thus improving the chances of finding a hyperplane. But in lower dimensions this property does not always hold true.
In two dimensions, Hyperplane is one dimensional. It is true for every n-dimensions. Hyperplane is always n-1 dimensions.
A hyperplane can be defined as :
If we don’t use a bias the hyperplane will always pass through the origin. We can solve this issue by adding one more dimension to the vectors. For simplicity we drop b from the hyperplane equation and, we add it into the feature vector w by adding one additional constant dimension. Using this convention,
We can verify this by the dot product of two vectors:
Using the above simplification :
This change to the hyperplane will transform the hyperplane to pass from the origin but it can be comprehended that adding an extra dimension makes sure that such a hyperplane exists that will split the data.
The perceptron was the first algorithm with a strong formal argument that if the data provided is linearly separable then the algorithm will find the separating hyperplane in finite number of steps.
** It is important to note that the most discriminative feature is the input feature that has the highest weight magnitude. In general sense the input with the heights weight has the strong connection in the perceptron computation