My Model Knows More than Yours Part II: Five Characteristics of High Quality Knowledge Representations
This is the second part of an essay that explores the factors that influence the quality of knowledge representation in machine learning models. Essentially, we are trying to answer a simple question: what makes a knowledge representation superior to others?
In the first part of the article, we revisited the No-Free-Lunch theorem and explained how generalization is the key element to high quality knowledge representations. Similarly, we discussed how, in order to reduce the generalization error, knowledge representations should be able to execute efficient regularization techniques. Today, I would like to dig a bit deeper into the specifics and review a five key regularization strategies that are relevant to improve representation learning models.
Improving Knowledge by Regularization
Just to get the terminology straight, by regularization we are referring to the ability of a model to reduce its test error(generation error) without impacting its training error. Every knowledge representation has certain characteristics that makes it more prompt to specific regularization techniques. Artificial intelligence luminaries Ian Goodfellow and Yoshua Bengio have done some remarkable work in the area of regularization. Based on Goodfellow and Bengio’s thesis, there are a few characteristics that make knowledge representations more efficient when comes to regularization. I’ve summarized five of my favorite regulation patters below:
1 — Disentangling of Causal Factors
One of the key indicators of a robust knowledge representation is the fact that its features correspond to the underlying causes of the training data. This characteristic helps to separate which features in the representation correspond to specific causes in the input dataset and, consequently, help to better separate some features from others.
2 — Smoothness
Representation smoothness is the assumption that a value of a hypothesis doesn’t change drastically among points in close proximity in the input dataset. Mathematically, smoothness implies that f(x = v) ==> f(x) for a very small v. This characteristic allow knowledge representations to generalize better across close areas in the input dataset.
Linearity is a regularization pattern that is complementary to the smoothness assumption. Conceptually, this characteristic assumes that the relationship between some input variables is linear (f(x) = ax + b) which allows to make accurate predictions even when there are relatively large variations from the input.
4 — Hierarchical Structures
Knowledge representations based on hierarchies are ideal for many regularization techniques. A hierarchy assumes that every step in the network can be explained by previous steps which tremendously helps to better reason through a knowledge representation.
5 — Manifold Representation
Manifold learning is one of the most fascinating, mathematically deep foundations of machine learning. Conceptually, a manifold is a high dimensional area of fully connected points. The manifold assumption states that probability masses tend to concentrate is manifolds in the input data. The great thing about manifolds is that they are relatively easy to reduce from high dimensional structures to lower dimensional representations which are easier and cheaper to manipulate. Many regularization algorithms are especially efficient at detecting and manipulating manifolds.