Improving Deep Learning Algorithms: Optimization vs. Regularization
Implementing machine learning and deep learning algorithms is different from writing any other type of software program. While most code goes through the traditional authoring, compilation/interpretation, testing and execution lifecycle, deep learning models live through a never ending lifecycle of testing and improvement processes. Most people generically refer to that part of the lifecycle as optimization but, in reality, it also includes another important area of deep learning theory: regularization. In order to understand the role that optimization and regularization play in deep learning models we should start by understanding how those models are composed.
Anatomy of a Deep Learning Model
What is a deep learning algorithm? Obviously, we know it includes a model but is not just that, isn’t it? Using a pseudo-match nomenclature, we can define a deep learning algorithm with the following equation:
DL(x)= Model(x) + Cost_Function(Model(x)) + Input_Data_Set (x) + Optimization(Cost_Function(x))
Using this conceptual equation, we can represent any deep learning algorithm as a function of an input data set, a cost function, a deep neural network model and an optimization process. In the context of this article, we are focusing on the optimization processes.
What those those processes so challenging in deep learning systems? One word: size. Deep neural networks include a large number of layers and hidden units that can also include many nodes. That level of complexity directly translates into millions of interconnected nodes which makes for an absolute optimization nightmare.
When thinking about improving a deep learning model, you should focus the efforts in two main areas:
a) Reducing the cost function.
b) Reducing the generalization error.
Those two subjects have become broad areas of research in the deep learning ecosystem know as optimization and regularization respectively. Let’s look at both definitions in a bit more detail.
The role of regularization is to modify a deep learning model to perform well with inputs outside the training dataset. Specifically, regularization focuses on reducing the test or generalization error without affecting the initial training error.
The field of deep learning has helped to create many new regularization techniques. Most of them can be summarize as functions to optimize estimators. Very often, regularization techniques optimize estimators by reducing their variance without increasing the corresponding bias( read my previous article about bias and variance). Many times, finding the solution to a deep learning problem is not about creating the best model but a model that regularize well under the right environment.
There are many types of optimizations in deep learning but the most relevant are focused on reducing the cost function of a model. Those techniques typically operate by estimating the gradient of different nodes and trying to minimize it iteratively. Among the many optimization algorithms in the deep learning space, stochastic gradient descent(SGD) has become the most popular variation with countless implementation in mainstream deep learning frameworks(see my previous article about SGD). It is also common to find many variations of SGD like SGD with Momentum that work better on specific deep learning algorithms.
What we generally refer to as optimization in deep learning model is really a constant combination of regularization and optimization techniques. For deep learning practitioners, mastering regularization and optimization is as important as understanding the core algorithms and it certainly play a key role in real world deep learning solutions.