Convolutional Neural Networks for the Rest of Us Part III: Benefits and Motivation
This is the second part of an essay that explained the fundamentals of convolutional neural networks(CNNs). The first part detailed some of the main concepts behind CNNs an presented a few analogies based on real world scenarios. Continuing the journey, this article will focus on some of the main benefits and motivations behind CNNs when applied to deep learning architectures.
CNNs are a key element of modern deep learning technologies. If you have been actively using frameworks such as TensorFlow, Caffe2, Theano or any of the dozens of new deep learning frameworks in the market, chances are that you have already encountered CNNs in some form or fashion. From that you should wonder what are the specific benefits that have made CNNs so popular compared to other neural network alternatives?
The main motivation behind the emergence of CNNs in deep learning scenarios has been to address many of the limitations that traditional neural networks faced when applied to those problems. When used in areas like image classification, traditional fully-connected neural networks simply don’t scale well due to their disproportionally large number of connections. CNNs bring a few new ideas that contribute to improve the efficiency of deep neural networks. Let’s explore a few of those some of the fundamental principles leveraged by CNNs:
1 — Sparse Representations
Let’s assume that you are working on an image classification problem that involves the analysis of large pictures that are millions of pixels in size. A traditional neural network will model the knowledge using matrix multiplication operations that involve every input and every parameter which results easily in tens of billions of computations. Remember that CNNs are based on convolution operations between and input and a kernel tensors? Well, it turns out that the kernel in convolution functions tends to be drastically smaller than the input which simplifies the number of computations required to train the model or to make predictions. In our sample scenario, a potential CNN algorithm will focus only on relevant features of the input image requiring fewer parameters to use in the convolution. The result could be a few billion operations smaller and more efficient than traditional fully-connected neural networks.
2 — Parameter Sharing
Another important optimization technique used in CNNs is known as parameter sharing. Conceptually, parameter sharing simply refers to the fact that CNNs tend to reuse the same parameters across different functions in the deep neural network. More specifically, parameter sharing entails that the weight parameters will be used on every position of the input which will allow the model to learn a single set of weights once instead of a different set for every function. Parameter sharing in CNNs typically results on massive savings in memory compared to traditional models.
3 — Equivariance
Equivariance is my favorite property of CNNs and one that can be seen as a specific type of parameter sharing. Conceptually, a function can be considered equivariance if, upon a change in the input, a similar change is reflected in the output. Using a mathematically nomenclature, a function f(x) is considered equivariant to a function g() if f(g(x))= g(f(x)). It turns out that convolutions are equivariant to many data transformation operations which means that we can predict how specific changes in the input will be reflected in the output.
These are some of the main theoretical benefits behind CNNs. In the next part of this series we will review specific CNNs architectures that are commonly used in deep learning models.