Overfitting is one of the best challenges in modern machine intelligence(MI) solutions. I often like to compare MI overfitting to human hallucinations as the former occurs when algorithms start inferring non-existing patterns in datasets. Despite its notoriety, there is no easy solution to overfitting and MI application often need to use techniques very specific to individual algorithms in order to avoid overfitting behaviors. This problem get even more scarier if you consider that humans are also incredibly prompt to overfitting. Just think about how many stereotypes you used in the last week. Yeah, I know….
Unquestionably, our hallucinations or illusions of validity are present somewhere in the datasets used in the training of MI algorithms which creates an even more chaotic picture.
Borges and MI Knowledge
Intuitively, we think about data when working on MI algorithms but there is also another equally important and often forgotten element of MI models: knowledge. In the context of MI algorithms, data is often represented as persisted records in one or more databases while knowledge is typically represented as logic rules that can be validated in the data. The role of MI models is to infer rules that can be applied to new datasets in the same domain. Unfortunately for MI agents, unlimited powerful computation capabilities are not a direct answer to knowledge building.
Jorge Luis Borges is considered one of the most emblematic Latin American writers and one of my favorite authors during my teenage years. In his story “Funes the Memorious”, Borges tells the story of Funes, a young man with a prodigious memory. Funes is able to remember the exact details he sees, like the shapes of the clouds in the sky at 3:45pm yesterday. However, Funes is tormented by his inability to generalize visual information into knowledge. Borges’ character is regularly surprised by his own image every times he sees himself in the mirror and is unable to determine if the dog seen from the side at 3:14pm, is the same dog seen from the back at 3:15pm. To Funes, two things are the same only if every single detail is identical in both of them.
Funes’ story is a great metaphor to explain that knowledge is not only about processing large volumes of information but also about generalizing rules that ignore some of the details in the data. Just like Funes, MI algorithms have almost unlimited capacity to process information. That computation power is a direct cause of overfitting as MI agents can infer millions of patters in data sources without incurring in a major cost.
What You Don’t See is as Important as What You See
During World War II, the Pentagon assembled a team of the country’s most renown mathematicians in order to develop statistical models that could assist the allied troops during the war. One of the first assignments consisted of estimating the level of extra protection that should be added to US planes in order to survive the battles with the German air force. Like good statisticians, the team collected the damage caused to planes returning from encounters with the Nazis.
For each plane, the mathematicians computed the number o bullet holes across different parts of the plane (doors, wings, motor, etc). The group then proceeded to make recommendations about which areas of the planes should have additional protection. Not surprisingly, the vast majority of the recommendations focused on the areas with that had more bullet holes assuming that those were the areas targeted by the German planes. There was one exception in the group, a young statistician recommended to focus the extra protection in the areas that hadn’t shown any damage in the inventoried planes. Why? very simply, the young mathematician argued that the input data set( planes) only included planes that have survived the battles with the Germans. Although severe, the damage suffered by those planes was not catastrophic enough that they couldn’t return to base. therefore, he concluded that the planes that didn’t return were likely to have suffered impacts in other areas. Very clever huh?
The previous story has some very profound lessons for anti-overfitting MI techniques. The only way to validate new knowledge is to apply it to unseen datasets and many times missing datasets are as important as existing ones. This is known in cognitive psychology as “learning by omission”. As many scientists know: “one million experiments are not enough to prove you right but a single one might be enough to prove you wrong”.