In my recent series about natural language processing(NLP), I mentioned that neural language models(NLMs) derive its effectiveness from a technique known as distributed representations( see my previous articles about natural language processing). As a follow up to those articles, I thought it would be a good idea to expand into the concept of distributed presentations as it has become a widely adopted technique in the deep learning ecosystem.
Learning by Representation
The first thing to understand about distributed representations is that its consider a form of a large deep learning discipline known as representation learning. Conceptually, representation learning focuses on optimizing knowledge representations and reusing it across models. Representation learning is used in scenarios that fall outside the supervised learning umbrella and that involve large volumes of unlabeled data.
The most famous instance of representation learning is known as transfer learning and enables the reusability of optimized knowledge across different domains. Other notable variations of representation learning include unsupervised pretraining(see my article about unsupervised pretraining) and semi-supervised learning.
One of the main challenges of representation learning is in scenarios in which the large number of unlabeled data containing knowledge based on a large number of underlying concepts. This is the specific area in which distributed representations can be useful.
Imagine a scenario that uses NLP to understand the characteristics of objects based on large text datasets. For instance, our model can understand that human brains are divided in cortices based on interconnected neurons using structures known as axons and dendrites. Let’s now extrapolate that example to scenarios in which each concept can have n different attributes with v possible values. Combining all the possible feature-value permutations, our model will be able to represent v^n different concepts which, in many cases, can result overwhelming to many traditional models.
Distributed representations are idea to analyze multi-attribute datasets and understand the relevant categories or symbols that correctly represent the input.
Distributed vs. Symbolic Representations
A good way to understand distributed representations is by understanding what they are not. The opposite of distributed representations is know as symbolic representations and are the foundation behind algorithms such as decision tree, clustering models or even NLP algorithms such as n-grams(see my article about NLP and n-grams). Symbolic representations typically associate an input record with a single symbol or category. While symbolic representations are an effective way to model a computational graph, they can run into issues trying to generalize knowledge in multi-attribute datasets.
Benefits of Distributed Representations
The main advantage of distributed representations over symbolic models is that the former tends to generalize better in scenarios with unlabeled data. This is mostly due to the fact that distributed representations can identify shared attributes between concepts. For instance, lets’ take a distributed representation algorithm that is interacting with a large dataset of pictures of animals. In that scenario, the target model can understand that different attributes such as “four_legs_,
“head” or brain are common to cats and dogs. Among other things that capability is the underlying reason why NLP models such as NLMs that operate using distributed representations tend to outperform statiscal techniques such as n-grams.