Assumptions, Dimensionality and Some of the Original Motivations Behind Deep Learning
Deep learning seems to be everywhere today. Not a week goes by in which we don’t hear new announcements or press releases about new deep learning technologies. Like any other uber-popular technology trend with mainstream press coverage, deep learning has been subjected to a lot of misconceptions. In my opinion, there have been many articles out there that make little distinction between deep learning, machine learning or artificial intelligence which results extremely confusing for even technically savvy readers.
One way to understand the difference between deep learning with other cognitive computing disciplines such as machine learning or artificial intelligence is to understand some of the original factors that motivated the creation of deep learning from a theoretical standpoint. In principle, deep learning was created to address some of the limitations of traditional machine learning algorithms in several areas. Among those challenges, I like to cite two that, in my opinion, became key accelerators to the raise of deep learning: the curse of dimensionality and assumption formulation.
Too Many Dimensions, Too Little Data: The Challenge of Dimensionality in Machine Learning
Machine learning algorithms have been long learning before the creation of deep learning. From that perspective, a question we should ask is: what were the limitations of traditional machine learning models that triggered the emergence of deep learning? The answer might boil down to a single word: dimensionality.
Traditional supervised and unsupervised machine learning models operate very efficiently on scenarios with a manageable number of dimensions but they become increasingly challenged when the number of dimensions in a dataset increases. That challenge is a consequence of the fact that the number of combinations of attributes in a dataset increases literally exponentially as the number of dimension grows. The industry has even labeled that phenomenon with the tragic name of the “curse of dimensionality”.
In the context of traditional machine learning algorithms, the curse of dimensionality becomes apparent when the number of configurations of dimensions in a dataset vastly outnumbers the number of training samples. Think about it, the purpose of machine learning algorithms is to generalize knowledge based on input data but, in a high dimensionality space, many combinations of dimensions are likely to have no training data associated with it. At that point, how can we generalize knowledge from which we haven’t seen any factual data? Before you state the obvious fact that the curse of dimensionality can be solved with more training data, consider that, in scenarios with a large number of dimensions, sometimes the data is simply not available or the computational costs associated with the training processes are prohibited.
So what can we do?
What do we typically do when we don’t know something? We make assumptions of course! Similarly, the whole field of deep learning is based on making assumptions to be able to generalize knowledge in high dimensionality scenarios. Technically, assumptions in deep learning algorithms are based on “prior beliefs” about the types of hypothesis a model should be learning. For instance, a classic assumption known as the Smoothness Prior states that the hypothesis functions should not change drastically within a small region of a dataset. So if we know certain hypothesis function for a specific segment of the dataset, we should assume that data points closer to that segment will comply with a small variation of that hypothesis. Another traditional assumption in deep learning relies on the principle that data was generated by a composition of factors that can be represented in some hierarchical form.
There are a large number of assumptions made in deep learning models. Assumptions help to generalize knowledge in the absence of data and without experiencing immense computation costs. I will deep dive into the subject of assumptions in deep learning algorithms in a later post.