Knowledge Tuning: Hyperparameters in Machine Learning Algorithms Part I
Model optimization is one of the toughest challenges in the implementation of machine learning solutions. Entire branches of machine learning and deep learning theory have been dedicated to the optimization of models. Typically, we think about model optimization as a process of regularly modifying the code of the model in order to minimize the testing error. However, the are of machine learning optimization often entails fine tuning elements that live outside the model but that can heavily influence its behavior. Machine learning often refers to those hidden elements as hyperparameters as they are one of the most critical components of any machine learning application.
Hyperparameters are settings that can be tuned to control the behavior of a machine learning algorithm. Conceptually, hyperparameters can be considered orthogonal to the learning model itself in the sense that, although they live outside the model, there is a direct relationship between them.
The criteria of what defines a hyperparameter is incredibly abstract and flexible. Sure, there are well established hyperparameters such as the number of hidden units or the learning rate of a model but there are also an arbitrarily number of settings that can play the role of hyperparameters for specific models. In general, hyperparameters are very specific to the type of machine learning mode you are trying to optimize. Sometimes, a setting is modeled as a hyperparameter because is not appropriate to learn it from the training set. A classic example are settings that control the capacity of a model( the spectrum of functions that the model can represent). If a machine learning algorithm learns those settings directly from the training set, then it is likely to try to maximize them which will cause the model to overfit( poor generalization).
If hyperparameters are not learned from the training set, then how does a model learn them? Remember that classic role in machine learning models to split the input dataset in an 80/20 percent ratio between the training set and the validation set respectively? Well, part of the role of that 20% validation set is to guide the selection of hyperparameters. Technically, the validation set is used to “train” the hyperparameters prior to optimization.
Some Examples of Hyperparameters
The number and diversity of hyperparameters in machine learning algorithms is very specific to each model. However, there some classic hyperparameters that we should always keep our eyes on and that should help you think about this aspect of machine learning solutions:
— Learning Rate: The mother of all hyperparameters, the learning rate quantifies the learning progress of a model in a way that can be used to optimize its capacity.
— Number of Hidden Units: A classic hyperparameter in deep learning algorithms, the number of hidden units is key to regulate the representational capacity of a model.
— Convolution Kernel Width: In convolutional Neural Networks(CNNs), the Kernel Width influences the number of parameters in a model which, in turns, influences its capacity.
Now that we know the importance of hyperparameters, the next step is to learn how to optimize them. That will be the subject of the next post.