Optimization is one of the broadest areas of research in the deep learning space. In previous articles, I explained the differences between optimization and regularization as two of the fundamental techniques used to improve deep learning models. There are several types of optimization in deep learning algorithms but the most interesting ones are focused on reducing the value of cost functions.
When we say that optimization is one of the key areas of deep learning we are not exaggerating. In real world deep learning implementations, data scientists often spend more time refining and optimizing models than building new ones. What makes deep learning optimization such a difficult endeavor. To answer that, we need to understand some of the principles behind this new type of optimization n.
Some Basics of Optimization in Deep Learning Models
The core of deep learning optimization relies on trying to minimize the cost function of a model without affecting its training performance. That type of optimization problem contrasts with the general optimization problem in which the objective is to simply minimize a specific indicator without being constrained by the performance of other elements( ex:training).
Most optimization algorithms in deep learning are based on gradient estimations (see my previous article about gradient based optimization). In that context, optimization algorithms try to reduce the gradient of specific cost functions evaluated against the training dataset. There are different categories of optimization algorithms depending on the way they interact with the training dataset. For instance, algorithms that use the entire training set at once are called deterministic. Other techniques that use one training example at a time has come to be known as online algorithms. Similarly, algorithms that use more than one but less than the entire training dataset during the optimization process are known as minibatch stochastic or simply stochastic. The most famous method of stochastic optimization which is also the most common algorithm in deep learning solution is known as stochastic gradient descent(SGD)(read my previous article about SGD).
Regardless of the type of optimization algorithm used, the process of optimizing a deep learning model is a careful path full of challenges.
Common Challenges in Deep Learning Optimization
There are plenty of challenges in deep learning optimization but most of them are related to the nature of the gradient of the model. Below, I’ve listed some of the most common challenges in deep learning optimization that you are likely to run into:
a)Local Minima: The grandfather of all optimization problems, local minima is a permanent challenge in the optimization of any deep learning algorithm. The local minima problem arises when the gradient encounters many local minimums that are different and not correlated to a global minimum for the cost function.
b)Flat Regions: In deep learning optimization models, flat regions are common areas that represent both a local minimum for a sub-region and a local maximum for another. That duality often causes the gradient to get stuck.
c)Inexact Gradients: There are many deep learning models in which the cost function is intractable which forces an inexact estimation of the gradient. In these cases, the inexact gradients introduces a second layer of uncertainty in the model.
d)Local vs. Global Structures: Another very common challenge in the optimization of deep leavening models is that local regions of the cost function don’t correspond with its global structure producing a misleading gradient.