Let's imagine that you've charted your data on apples and bananas. Fortunately, the two groups form neat clusters, so all we need to do is draw a line between them. The condition of being able to draw a line between two groups is called "linear separability".
Since we want to find a line between the two groups, we'll have to introduce some math. Mainly I'll use some simple algebra and geometry. Don't worry if you don't understand it all right away, just keep reading. It should all start to make sense eventually.
To find a line between the apples and bananas, we'll assign weights to an x and a y coordinate, add a bias b, and set the equation equal to 0. Thus, we have an equation for the line w1*x + w2*y + b = 0.
We call the vector w a weight vector (bold letters denote vectors), which in this case looks like [w1, w2]. The number of weights will be equal to the number of variables in any input vector, which is called the dimensionality of the input vector. Technically, a perceptron tries to find a hyperplane to separate points in an n-dimensional space. In this case, that hyperplane is a line.
Code (technical details)
If you are math and/or code averse, but still interested, you can skip this section and the next, right to "Application".
I've coded an example perceptron in the language Python, using gnuplot, gnuplot.py to draw pretty graphs, and numarray for the math. The code definitely runs on at least Windows and Linux, and it should run on OS X.
What we have is a vector x of data points and a vector y of classifications. y is of the same dimension as x, and an element yi in y is either 1 (banana) or -1 (apple). We'll pass these into our perceptron along with a number n, where 0 < n < 1.
The parameter n is called the learning rate, whose purpose is to affect how quickly the perceptron converges. If n is too high, the discriminant line will swing wildly. Too low, and it won't change quickly enough. n only changes the rate at which the perceptron learns, so it will still converge no matter what n you choose.
def perceptron(x, y, n):
Now we set our weights, our bias, a mistake counter, an iteration counter, and a boolean variable all to 0. "na." stands for numarray, my linear algebra library. na.zeros(N, type) creates an NxN zero matrix of type.
w = na.zeros(x.shape, na.Float32) #weights
b = 0 #bias
k = 0 #error count
iteration = 0 #number of iterations
mistake_free = 0
Our main loop is simple. When the perceptron has made a mistake classifying one of the pieces of data, mistake_free is set to false, and we want to continue trying to find a discriminant line. In the case where the input data is not separable, we want to stop iterating our loop at some point. In this case, I've chosen 100 as the iteration limit.
while not mistake_free and iteration < 100:
iteration += 1
mistake_free = 1
Each iteration, we want to loop through x (our known data points) and calculate the perceptron's current classification of each one. If the perceptron correctly identifies each banana as a banana and each apple as an apple, we're done. Otherwise, we're going to have to change w and b.
for i in range(len(x)):
actual_out = na.dot(w, x[i]) + b #<w*x> + b
if y[i] * actual_out >= 0: #if correctly classified.
If we made a mistake, we update the weights such that w = w + n * y[i] * x[i]. Remember that n is our learning rate and y[i] is the correct classification of the current point x[i]. Also, set the bias so b = b + y[i] - actual_out, where actual_out is the incorrect output of the perceptron for x[i].
The reason for the weight updates is fairly simple. If the perceptron misclassified a banana as an apple, we want to move the discriminant down a bit, so we subtract n*x[i] from w. Conversely, we add n*x[i] to w if it was too low. The bias update works similarly.
w += n * y[i] * x[i]
b += y[i] - actual_out
k += 1
mistake_free = 0
Finally, we just return our weights, our bias, and our mistake counter. While we don't strictly need to return the mistake counter, it's good to know how long it took the perceptron to classify your data correctly. It can help you choose a better learning rate or see that your data is poorly separated.
return (w, b, k)
So, now that we've got our data all gathered and our perceptron set up, it's time to apply it. To do so, we put the coordinates of the data points in x, and their corresponding classes, represented by 1 or -1, in y. The result will look something like this. (It will look slightly different if you run the program yourself, since the points are generated randomly.)
We now have a set of weights [w1, w2] and a bias b which make it easy to determine if an unknown piece of fruit is probably a banana or probably an apple. Given its size (s) and its yellowness (t), we simply plug these two numbers back into our original equation, w1*s + w2*t + b, and if the result is positive, we have a banana. If the result is negative, we have an apple. If the result is 0, we have a point on the discriminant line, which means we're unsure about which it is.
Perceptrons were introduced in 1956 by Frank Rosenblatt, with the intention of reproducing the way that brains work. For a time, especially in the early 1960s, they were researched quite heavily. In 1969, Minsky and Papert wrote Perceptrons, a book which highlighted the limitations of the technique. They proved that a perceptron could only solve linearly separable problems, famously showing that it could not solve for the binary XOR.
Before Perceptrons, researchers had been creating networks of perceptrons, but almost nobody had studied their limitations. The book stifled nearly all research into them during the 1970s and early 1980s. However, in the 1980s, researchers discovered that multilayer perceptrons did not have the limitations that Minsky had conjectured they did, and neural network research began again in earnest.
For simple two-class classification tasks, perceptrons can function as a quick and dirty method of finding a discriminant. They will run quickly, and are guaranteed to find a solution if one exists. Furthermore, the algorithm is very simple to implement and widely studied.
The real power of the perceptron, however, lies in networking them. A series of interconnected perceptrons, in modified form, is the basis of a neural network. A neural network of two layers is capable of learning any continuous function, and it will perform better on poorly separated and noisy data. Of course, with this power comes difficulty, as neural networks are fairly complicated and difficult to use.
Bishop, Christopher M. Neural Networks for Pattern Recognition, Oxford University Press, 1995.
Cristianini, Nello. Support Vector Machines, Cambridge University Press, 2000.
Minsky, M. L. and Papert, S. A. Perceptrons, MIT Press, 1969.
Rosenblatt, Frank. The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain, Cornell Aeronautical Laboratory, Psychological Review, v65, No. 6, pp. 386-408, 1958.