The Art of Simplification: Manifold Learning
High dimensionality datasets are one of the biggest challenges in machine learning algorithms. In datasets with a large number of dimensions, the combinations of distinct sets of variables can grow exponentially making learning really unpractical due to the computational cost. This is known in deep learning theory as the curse of dimensionality and remains one of the main challenges that triggered the raise of deep learning. Among the many techniques used to address the curse of dimensionality, manifold learning has become incredibly popular within deep learning models.
The Art of Simplification
In order to understand manifold learning it might be useful to draw some analogies from how our brains simplify knowledge. Think about the last time you explaining a complex subject to a friend or colleague. Most likely, you didn’t try to convey your entire knowledge of the subject, which probably took you years to acquire. Instead, you decided to present the most relevant elements so that the recipient of the explanation could get a general idea about the topic: you decided to simplify.
Simplification is a cognitive process that help us learn about complex subjects. The cognitive science behind simplification is, ironically, pretty complex and involves many heterogeneous sub processes. Among the many things out mid does when trying to simplify a subject, there are two that are very relevant to the concept of manifold learning:
a) We try to determine the most important parts of the subject of knowledge and skip the rest.
b) We use analogies to simplifies to help us understand specific subjects.
The combination of simplification and analogies is at the core of manifold learning.
A manifold is a fascinating mathematical structure that abstracts a connected region in which each point is associated with a set of points in its neighborhood. Because of the connections, points in a manifold can be transformed into other points with minimum effort. In the context of machine learning, manifolds are used to represent a connected number of data points that can be modeled as transformations from a higher-dimensional space.
Confused already????? :) Its really simpler than it sounds. The key assumption of manifold learning is hat in a high dimensional structure, most relevant information is concentrated in small number of low dimensional manifolds. This is know in machine learning theory as the Manifold Hypothesis and includes two main points: data distribution and connectivity.
One part of the Manifold Hypothesis assumes that the probability distribution in datasets such as images or text is highly concentrated. A classic example to illustrate that point is to generate a meaningful text by randomly selecting words from a long text. Obviously this very unlikely to work because the distribution of coherent natural language sentences is a very small space in the dataset of the combinations of letters of the original text.
The second element of the Manifold Hypothesis is connectivity which entails that relevant data points are connected to other relevant data points. We can see clearly in images in which a pixel can be obtained using a simple transformation from another pixel in its neighborhood.
Manifold learning relies on the Manifold Hypothesis to extract relevant knowledge from a high dimensional dataset by transforming into manifold structures of a lower dimension. This technique drastically reduces the computational costs in many deep learning models and has become an increasingly important element of deep learning solutions.