Knowledge generalization is, arguably, the biggest challenge of machine learning systems. It is relatively easy to create models that match a training set but the story is different when comes to performing against the test or other unknown dataset. Most machine learning models tend to overfit when executed against new datasets. In order to effectively generalize knowledge, machine learning algorithms leverage statistical estimation techniques. Instead of inferring the exact formula for a specific parameter, machine learning models rely on statistics to generalize the distribution of those parameters. Among the many estimators you should always consider in machine learning algorithms, bias and variance play a prominent role helping to achieve good generalizations.
Understanding bias and variance is, essentially, analyzing the variations between estimated and known data points. Sometimes, those data points are parameters in a function while, other times, are the entire function itself. Finding an acceptable equilibrium between bias and variance is a key aspect of regularization and generalization strategies in machine learning models.
When evaluating the bias of an estimator, we are looking at the variation between the estimation of a dataset and its true values. Mathematically, we can represent the bias of an estimator using the following formula:
bias(P)= E(P) — Vp(P)
Where the E function defines the estimated value of a parameter and V its true value.
An estimator is said to be unbiased if E(P) = Vp(P) or bias(P) = 0. Similarly, an estimator is called asymptotically unbiased if the bias trends towards 0 with a large number of examples.
The magic of calculating the bias of a hypothesis relies on finding the correct estimator. Well known statistical distributions such as Bernoulli or Gaussian include estimators that have proven effective on many machine learning algorithms.
Moving away from mathematics, we can illustrate the concept of bias using examples from our everyday lives. A clock that is always one hour late or a quantitive trading algorithm that always predicts the price of a stock 1% higher are examples of data estimators with strong bias. In the context of machine learning algorithms, the goal of to regularize models by lowering the bias.
When estimating a hypothesis in a machine learning algorithm, we should estimate how much it will vary as a function of the underlying dataset. This characteristic is known as variance and represents a strong complement to the bias property of estimators. Conceptually, the variance of an estimator quantifies how much its value will vary when resampling the dataset( from training to testing). The square root of the variance is another relevant metric known as the standard error.
Going back to our real world examples, if our clock is all over the place but, averaged over time, its values approximate the real time then is said to have a low variance. However, if the predictions of our quantitive trading algorithm change drastically from stock to stock then we say that the estimator has a high variance.
How are bias and variance relevant in machine learning models? In super simple terms, the art of generalization can be summarized by reducing the variance of a model without increasing its variance. Many machine learning models already include methods for tracking variance and bias and we should pay attention to both estimators if we want to avoid overfiting and improve the performance of machine learning models on unknown datasets.