My Model Knows More than Yours Part II: Five Characteristics of High Quality Knowledge Representations

This is the second part of an essay that explores the factors that influence the quality of knowledge representation in machine learning models. Essentially, we are trying to answer a simple question: what makes a knowledge representation superior to others?

In the first part of the article, we revisited the No-Free-Lunch theorem and explained how generalization is the key element to high quality knowledge representations. Similarly, we discussed how, in order to reduce the generalization error, knowledge representations should be able to execute efficient regularization techniques. Today, I would like to dig a bit deeper into the specifics and review a five key regularization strategies that are relevant to improve representation learning models.

Improving Knowledge by Regularization

Just to get the terminology straight, by regularization we are referring to the ability of a model to reduce its test error(generation error) without impacting its training error. Every knowledge representation has certain characteristics that makes it more prompt to specific regularization techniques. Artificial intelligence luminaries Ian Goodfellow and Yoshua Bengio have done some remarkable work in the area of regularization. Based on Goodfellow and Bengio’s thesis, there are a few characteristics that make knowledge representations more efficient when comes to regularization. I’ve summarized five of my favorite regulation patters below:

1 — Disentangling of Causal Factors

One of the key indicators of a robust knowledge representation is the fact that its features correspond to the underlying causes of the training data. This characteristic helps to separate which features in the representation correspond to specific causes in the input dataset and, consequently, help to better separate some features from others.

2 — Smoothness

Representation smoothness is the assumption that a value of a hypothesis doesn’t change drastically among points in close proximity in the input dataset. Mathematically, smoothness implies that f(x = v) ==> f(x) for a very small v. This characteristic allow knowledge representations to generalize better across close areas in the input dataset.


Linearity is a regularization pattern that is complementary to the smoothness assumption. Conceptually, this characteristic assumes that the relationship between some input variables is linear (f(x) = ax + b) which allows to make accurate predictions even when there are relatively large variations from the input.

4 — Hierarchical Structures

Knowledge representations based on hierarchies are ideal for many regularization techniques. A hierarchy assumes that every step in the network can be explained by previous steps which tremendously helps to better reason through a knowledge representation.

5 — Manifold Representation

Manifold learning is one of the most fascinating, mathematically deep foundations of machine learning. Conceptually, a manifold is a high dimensional area of fully connected points. The manifold assumption states that probability masses tend to concentrate is manifolds in the input data. The great thing about manifolds is that they are relatively easy to reduce from high dimensional structures to lower dimensional representations which are easier and cheaper to manipulate. Many regularization algorithms are especially efficient at detecting and manipulating manifolds.

CEO of IntoTheBlock, Chief Scientist at Invector Labs, I write The Sequence Newsletter, Guest lecturer at Columbia University, Angel Investor, Author, Speaker.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store