Understanding Energy Based Models
Some of the key concepts behind a popular form of generative models.
I recently started an AI-focused educational newsletter, that already has over 150,000 subscribers. TheSequence is a no-BS (meaning no hype, no news etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:
The best source to stay up-to-date with the developments in the machine learning, artificial intelligence, and data…
Generative models have become one of the hottest topics in modern machine learning(ML). This type of deep learning architectures focused on observing data, such as images or text, and learning to model the underlying data distribution. Among the many forms of generative models, energy based models(EBM) have been gaining popularity in recent years. As it names indicates, EBMs borrow some concepts from statistical physics and apply them to deep neural network architectures.
Like traditional generative models, EBMs are able to learn the underlying distribution of a dataset and generate samples that match that distribution. What makes EBMs different from other generative models is the underlying mechanics used to accomplish that task. Specifically, EBMs represent probabilistic distributions over data by assigning an unnormalized probability scalar (or “energy”) to each input data point. In that framework, tasks like prediction consists of finding values in observed variables that minimize energy. Similarly, learning is modeled by finding an energy function that associates low energies to correct values of the remaining variables, and higher energies to incorrect values.
To illustrate the inner-workings of EMBs, consider an image classification problem in which a model needs to classify a given picture as one of five categories: human, animal, airplane, car or truck. The model will use an energy function E(Y, X) in which X are the pixels of the image and Y a discrete label describing the objects in the image. Given an input X, the model should produce an output Y that minimizes the…