Learning by Committee: Ensemble Learning and Artificial Intelligence
Most artificial intelligence(AI) algorithms focus on selecting a hypothesis from the problem space and use it to make predictions. In that sense, most AI techniques are specializations of processes that discover and optimize hypothesis in order to make the most accurate predictions. Obviously, each class of AI algorithms provide very specific models to try to select the very best hypothesis available. The question is….what happens when they fail to do so?
Ensemble learning is an increasingly popular technique in AI circles that tries to overcome some of the risks of single-hypothesis algorithms. Conceptually, the idea behind ensemble learning is top select a collection(ensemble) of hypotheses and combine their individual predictions into a final prediction. Think about ensemble learning as asking a question to a panel of experts each one of which provides a different answer and explanation which you will then combine into a final answer.
The reasoning behind ensemble learning is as simple as it is clever. If the selected hypothesis is not completely stupid (which they are not in most cases ) the combination of predictions should at least help to reduce the error rate and, in many cases, improve the final prediction of the algorithm. Consider a simple classification example that determines whether a person is likely to vote Republican or Democrat. Instead of evaluating a single hypothesis, an ensemble learning algorithms could generate 20 different decision trees and make a final prediction by simply majority voting. In that trivial example, the ensemble will have to get 11 hypothesis wrong in order to misclassify the voter. This is, of course, assuming that no Russian hackers are involved ;)
Boosting is, arguably, one of the most popular ensemble learning methods in modern AI. The algorithm leverage the notion of a weighted training set in which each example has an associated weight based on its importance. The higher the weight, the higher the importance of that example related to the target hypothesis.
At the beginning, the Boosting algorithm assigns all training sets the same weight and proceeds to evaluate the first hypothesis. After the initial hypothesis is evaluated, the algorithm will increase the weight of the misclassified examples g and decrease the weight of the examples classified correctly. With that network configuration, the Boosting algorithm will proceed to generate and evaluate the second hypothesis and it will repeat the same process until is has successfully successfully N hypothesis (N is typically an input of the Boosting algorithm). The final ensemble hypothesis will combine all N hypothesis according to how they performed on the training set. For the AI geeks, I recommend checking out the ADABoost algorithm which is one of the most common implementations of the boosting model and is included in many popular AI frameworks.
One of the drawbacks of ensemble learning is the propensity towards generating complex hypothesis and, consequently, become victims of overfitting (poor generalization). Like everything in life, simpler explanations are better but, many times, simple and accurate are hard to find :).