With the rapid raise of deep learning, we are constantly seeing how several academic terms are used by the mainstream media without providing any insights into the concepts which ends up causing a lot of confusion. Convolutional neural networks(CNNs) is one of those theoretical terms that you can regularly find in deep learning articles which can result confusing for the average reader. Without going too deep into the theory, I thought I’d take a swing at explaining some of the concepts behind CNNs and how they relate to deep learning applications.
CNNs have come to popularity with the emergence of computer vision scenarios. However, the theory behind CNNs dates back to the late 1980s. Conceptually, CNNs are a specific type of neural network that relies on a mathematical operation called convolution. In simple terms, a convolution is an operation between that involved two tensors( multi-dimensional arrays) known as input and kernel respectively. The objective of the convolution operation is to produce a new tensor that simplifies the noise in the input. In the context of deep learning models, convolutions can help to improve the computation model of a neural network. Typically, CNNs used convolution as amore efficient alternative to matrix multiplication operation across nodes.
If you are already confused do not feel bad ;); these concepts are really complex to explain outside the realm of mathematics. Let’s try to use an example to put CNNs in a real world context. Support that we are monitoring an oil field using a large number of sensors. Each sensor produces telemetry data about different parameters such as temperature, terrain density, wind speed and many other relevant metrics. The measurements are produced every few millisecond but they can be all over the place because the weather conditions in the region affect the accuracy of the sensors. In other words, the data is noisy. To try to estimate more accurate telemetry information we can try to calculate averages of the data produced by sensors in the same area. However, that technique is not very effective as it assumes that all sensors in all areas are equally accurate regardless of their historical performance which is rarely the case in the real world. Alternatively, we can introduce a more sophisticated operation that uses weights and other statistical artifacts on the data. That’s called a convolution in which the original telemetry dataset is the input and the weighted tensor in the kernel.
From a historical standpoint, CNNs are, arguably, the most efficient type of deep learning model inspired by functions of the human brain. More specifically, CNNs have been inspired by the vision system in humans and other mammals.
The human brain contains an area known as the primary visual cortex(PVC) that a key element of our vision system. Anatomically, PVC receives visual signals from the optic nerve, responds to specific stimulus and passes the responses to other areas of the visual system. PVC’s network structure as well as its different type of neurons were the historic inspiration for the creation of CNNs. Structurally, the PVC region is organized as a spatial map that mimics the retina guaranteeing that only specific areas are activated based on individual signals. CNNs are typically organized as two-dimensional maps that emulates PVC’s structure. similarly, the response of certain neurons in the PVC region is not affected by minor change in elements of the input signal such as lighting and position. CNN’s leverage the convolution operation to achieve the same goal.
This should give you a basic idea of the goals, mathematical concepts and historical influences behind CNNs. In the next part of this article, we will deep dive into CNNs from the context of deep learning models.