My Model Knows More Than Yours: Representation Learning and Knowledge Quality Part I
Last week, we presented the notion of Transfer Learning as a way to create knowledge representations that could be transferable between different areas of a model. Transfer learning is a specific subset of a discipline known as representation learning which deals with structuring and optimizing knowledge in machine learning algorithms.
If you think about deep learning as a subset of machine learning, then representation learning is the domain in between. From that perspective, representation learning can be considered a subset of machine learning and a superset of deep learning:
Machine Learning ==> Representation Learning ==> Deep Learning
The central problem of representation learning is to determine an optimal representation for the input data. In the context of deep learning, the quality of a representation is mostly given by how much it facilitates the learning process. In the real world, the learning algorithm and the underlying representation of a model are directly related.
The No Free Lunch Theorem
So if the knowledge representation of a model is tied to is learning algorithm then selecting the correct representation should be trivial, right? We simply pick the knowledge representation associated with the learning task and that should guarantee an optimal performance. I wish were that simple. In the journey to find an optimal representation we quickly find an old friend: The No Free Lunch Theorem(NFLT).
Remember NFLT? We discussed it in details a few weeks ago so I am not going to go deep into its details in this article. In a nutshell, NFLT states that, averages over all possible data generating distributions, every machine learning algorithm has approximately the same error rate when processing previously unobserved points (read my previous article about NFLT). In other words, no machine learning algorithm is better than any other given a broad enough dataset.
In the context of representation learning, NFLT demonstrates that multiple knowledge representations can be applicable to the learning task. If that’s the case, how can we empirically decide on one knowledge representation vs. another? The answer is one of the core, and often ignored, techniques in machine learning and deep learning models: regularization.
A core task of machine learning algorithms is to perform well with new inputs outside the training dataset. Optimizing that task is the role of regularization. Conceptually, regularization induces modifications to a machine learning algorithm that reduces the test or generalization error without affecting the training error.
Let’s now come full circle and see how regularization is related to representation learning. The relationship is crystal clear: the quality of a knowledge representation is fundamentally related to its ability to generalize knowledge efficiently. In other words, the knowledge representation must be able to adapt to new inputs outside the training dataset. In order to perform well with new inputs and reduce the generalization error, any representation of knowledge should be useful in regularization techniques. Therefore, the quality of representation learning models is directly influenced by its ability to work with different regularization strategies. The next step is to figure out which regularization strategies are specifically relevant in representation learning. That will be the topic of a future post.