A Different Way to Think About Overfitting and Underfitting in Machine Learning Part I: Capacity
In the past, I’ve written extensibly about the concepts of overfitting and underfitting and their roles in machine learning models (check out my article about Borges and overfitting). Most of the previous articles explained the concepts of overfitting and underfitting at a cognitive/psychological level which is helpful but not immediately applicable to machine learning algorithms. Today, I would like to present a couple of pseudo-mathematical ideas that may give you a framework to deal with overfitting and underfitting in machine learning models.
Dumb or Hallucinating
Challenges such as overffitting and underfitting are related to the capacity of a machine learning model to build relevant knowledge based on an initial set of training examples. Conceptually, underfitting is associated withe the inability of a machine learning algorithm to infer valid knowledge from the initial training data. Contrary to that, overfitting is associated with model that create hypothesis that are way too generic or abstract to result practical. Putting it in simpler terms, underfitting models are sort of dumb while overfitting models tend to hallucinate(imagine things that don’t exist ) :).
Understanding Model Capacity
Let’s try to formulate a simple methodology to understand overfitting and underfitting in the context of machine learning algorithms.
A typical machine learning scenario starts with an initial data set that we use to train and test the performance of an algorithm. The statistical wisdom suggests that we use 80% of the dataset to train the model while mainthing the remaining 20% to test it. During the training phase, out model will produce certain deviation from the training data which we is often referred to the Training Error. Similarly, the deviation produced during the test phase is referred to as Test Error. From that perspective, the performance of a machine learning model can be judged on its ability to accomplish two fundamental things:
1 — Reduce the Training Error
2 — Reduce the gap between the Training and Test Errors
Those two simple rules can help us understand the concepts of overfitting and underfitting. Basically, underfitting occurs a model fails at rule #1 and is not able to obtain a sufficiently low error from the training set. Overfitting then happens when a model fails at rule #2 and the gap between the test and training errors is too large. You see? two simple rules to helps us quantify the levels of overfitting and underfitting in machine learning algorithms.
Another super important concept that tremendously helps machine learning practitioners deal with underfitting and overfitting is the notion of Capacity. Conceptually, Capacity represents the number of functions that a machine learning model can select as a possible solution. for instance, la linear regression model can have all degree 1 polynomials of the form y = w*x + b as a Capacity (meaning all the potential solutions).
Capacity is an incredibly relevant concept machine learning models. Technically, a machine learning algorithms performs best when it has a Capacity that is proportional to the complexity of its task and the input of the training data set. Machine learning models with low Capacity are impractical when comes to solve complex tasks and tend to underfit. Along the same lines, models with higher Capacity than needed are prompt to overfit. From that perspective, Capacity represents a measure by which we can estimate the propensity of the model to underfit or overfit.
We will cover a few other machine learning theories relevant to understand overfiting and underfitting in the second part of this essay.