Learning by Competition: Understanding Adversarial Neural Networks Part I
This weekend I was reading some interesting research about new techniques to analyze adversarial neural networks and it occurred to me that it might be a good idea to revisit some of the fundamentals of this new technique that has been gaining a lot of traction in the deep learning ecosystem. In the past, I’ve briefly written about adversarial neural networks but without getting too deep into the details so there is no risk of sounding too repetitive ;).
Adversarial neural networks were created to address one of the most important challenges of modern deep learning applications: how to train models with high quality data while keeping the resources manageable. This type of problem has been at the center of machine learning and deep learning almost since their inception. One of the reasons that makes the training of deep learning systems so challenging is that it typically relies on pre-processed data. That approach contrasts with the way humans acquire and generate knowledge and, in particularly, with one of our most magical cognitive abilities: intuition
The Intuition Metaphor
Conceptually, intuition is the ability of understanding knowledge without explicit conscious reasoning. The unconscious aspect of intuition is some important that some neuroscientists even refer to this phenomenon as unconscious cognition. As humans, every time we are presented with a new piece of knowledge we can “intuitively” derive variations of it without the need to reason deeply about it. In other words, we are generating new knowledge without conscious reasoning. Wouldn’t it be nice if we could apply similar concepts to deep learning systems?
Simulating intuition in deep learning models can be the combination of two often competing processes: generating new knowledge and discriminating facts to assert its quality. These two processes encompassed two of main school of thoughts of deep learning classification models before adversarial neural networks came along.
Discriminative vs. Generative Models
To understand adversarial neural networks it might be useful to revisit the two main approaches of information classification in traditional machine learning systems:
- A discriminative model learns a function that maps the input data (x) to some desired output class label (y). In probabilistic terms, they directly learn the conditional distribution P(y|x).
- A generative model tries to learn the joint probability of the input data and labels simultaneously, i.e. P(x,y). This can be converted to P(y|x) for classification via Bayes rule, but the generative ability could be used for something else as well, such as creating likely new (x, y) samples.
Both discriminative and generative models have strengths and weaknesses. Discriminative algorithms tend to perform incredibly well in classification tasks involving high quality datasets. However, generative models have the unique advantage that can create new datasets similar to existing data and operate very efficiently in environments that lack a lot of labeled datasets.
For years, the entire space of classification algorithms could be segmented between generative and discriminative models. That changed in 2014 when a group of researchers led by deep learning luminary Ian Goodfellow(now with OpenAI) published a paper advocating for a new type of model that combine both generative and discriminative techniques to generate high quality knowledge. They called the new technique generative adversarial networks(GANs) and that will be the subject of our next post.