# Data Science Programming All-In-One For Dummies by John Paul Mueller & Luca Massaron

Author:John Paul Mueller & Luca Massaron
Language: eng
Format: epub
ISBN: 9781119626145
Publisher: Wiley
Published: 2019-12-11T00:00:00+00:00

Touching the nonseparability limit

The secret to perceptron calculations is in how the algorithm updates the vector w values. Such updates happen by randomly picking one of the misclassified examples. You have a misclassified example when the perceptron determines that an example is part of the class, but it isn’t, or when the perceptron determines that an example isn’t part of the class, but it is. The perceptron handles one misclassified example at a time (call it xt) and operates by changing the w vector using a simple weighted addition:

w = w + ŋ(xt * yt)

This formula is called the update strategy of the perceptron, and the letters stand for different numerical elements:

The letter w is the coefficient vectors, which is updated to correctly show whether the misclassified example t is part of the class.

The Greek letter eta (η) is the learning rate. It’s a floating number between 0 and 1. When you set this value near zero, it can limit the capability of the formula to update the vector w almost completely, whereas setting the value near one makes the update process fully impact the w vector values. Setting different learning rates can speed up or slow down the learning process. Many other algorithms use this strategy, and lower eta is used to improve the optimization process by reducing the number of sudden w value jumps after an update. The trade-off is that you have to wait longer before getting the concluding results.

The xt variable refers to the vector of numeric features for the example t.

The yt variable refers to the ground truth of whether the example t is part of the class. For the perceptron, algorithm yt is numerically expressed with +1 when the example is part of the class and with -1 when the example is not part of the class.