Simplicity, Bias, Variance and Some Ideas to Solve Overfitting in Machine Intelligence Models
Yesterday, we introduced the concept of overfitting as one of the biggest challenges of machine intelligence(MI) applications. Today, I would like to explore a few ideas to deal with overfitting in MI models.
In yesterday’s post, I drew the parallel between overfitting and hallucinations. Conceptually, overfitting occurs when MI algorithms infer incorrect knowledge or patterns from datasets. Obviously, the potential consequences of overfitting in MI models can be catastrophic if not handled appropriately. Then we have to ask ourselves how to protect MI models from “hallucinating”. There are a few ideas that might help.
1 — Data — Hypothesis Ratio
Other than bad algorithmization, overfitting mostly occurs when a model produces too many hypothesis without the corresponding data to validate them. As a result, MI application should try to keep a decent ration between the test datasets and the hypothesis that should be evaluated. However, this is not always an option.
There are many MI algorithms such as inductive learning that rely on constantly generating new and sometimes more complex hypothesis. In those scenarios, there are some statistical techniques that can help estimate the correct number of hypothesis needed to optimize the chances of finding one close to correct. Harvard professor Leslie Valiant brilliantly explains this concept in his book Probably Approximately Correct.
2 — Simpler -Hypothesis
A conceptually trivial but technically difficult idea to deal with overfitting in MI models is to generate simpler hypothesis. Of course! Simple is always better isn’t it? But what is a simpler hypothesis in the context of MI algorithms? If we need to reduce it to a quantitive factor, I would say that the number of attributes in an MI hypothesis is directly proportional to its complexity.
Simpler MI hypothesis tend to be easier to evaluate than others with large number of attributes both computationally and cognitively. As a result, simpler are typically less prompt to overfit than complex ones. Great! now the next obvious headache is to figure out how to generate simpler hypothesis in MI models. A non-so-obvious technique is to attach some form of penalty to an algorithms based on its estimated complexity. That mechanism tends to favor simpler, approximately accurate hypothesis over more complex and sometimes more accurate ones that could fall apart when new datasets appear.
3 — The Bias-Variance Balance
Pedro Domingos is one of my favorite MI researchers and thought leaders. In some of its work, Domingos explains the friction between bias and variance as a mechanism to handle overfitting in MI models. One of Domingos’ most famous examples to explain bias and variance refers to a clock that is always one hour late as an example of high bias but low variance. If instead the clock is all over the place but almost always indicates the right time, then we say it has high variance but low bias.
In the context of MI models, we can regularly compare hypothesis against test datasets and evaluate the results. If the hypothesis continue outputting the same mistakes, then we have a big bias issue and we need to tweak or replace the algorithm. If instead there is no clear pattern to the mistakes, the problem is variance and we need more data.