Knowledge Tuning: Hyperparameters in Machine Learning Algorithms Part II
This is the second part of an article that explores the role of hyperparameters in machine learning models. In the previous article, we presented the fundamental concepts behind hyperparameters and its relevance in validations ets. In this part, we are going to focus on the techniques for selecting and optimizing hyperparameters.
The process of selecting hyperparameters is a key aspect of any machine learning solution. Most machine learning algorithms explicitly define specific hyperparameters that control different aspects such as memory or cost of execution. However, additional hyperparameters can be defined to adapt an algorithm to a specific scenario. Data science technologists typically spend quite a bit of time tuning hyperparameters in order to achieve the best performance for a particular model.
When comes to selecting and optimizing hyperparameters, there are two basic approaches: manual and automatic selection. Both approaches are technically viable and the decision typically represents a tradeoff between the deep understanding of a machine learning model required to select hyperparameters manually versus the high computational cost required by automatic selection algorithms.
Selecting Hyperparameters Manually
The main objective of manual hyperparameter selection is to tune the effective capacity of a model to match the complexity of the target task. Imagine that you are training to climb Mount Everest. During the grueling training process, you want to subject your body to all sorts of routines so that it can perform on high altitude, low temperature and low barometric pressure situations. However, you don’t want to push your body to an extreme that it might cause it to shut down. Similarly, you need to decide to carry enough provisions and tools to use in all sorts of unexpected situations but you also don’t want to carry too much weight that can affect your agility on the mountain. In other words, the objective of the training process is to help you maximize your effective capacity for the task at hand.
Machine learning models also have a notion of effective capacity. In that context, the effective capacity of a machine learning algorithm is determined by three main factors:
1) The representational capacity of the algorithm or the set of hypotheses that satisfy the training dataset.
2)The effectiveness of the algorithm to minimize its cost function.
3)The degree on which the cost function and training process minimize the test error.
Sounds confusing? To see how these factors are related, let’s select a deep learning algorithm with many layers and many hidden units. By definition, that type of model has a large representational capacity because it can easily model complex functions. However, our model might not be able to learn all those functions based on the constraints of the training set. Similarly, many of the potential functions might conflict with the model regularization strategies to minimize the test error. Tuning different hyperparameters is the key to find the optimal balance between those factors.
Automatic Hyperparameter Optimization
Optimizing hyperparameter manually can be an exhausting endeavor. To address that challenge, we can use algorithms that automatically infer a potential set of hyperparameters and attempt to optimize them while hiding that complexity from the developer. Algorithms such as Grid Search and Random Search have become prominent when comes to hyperparameter inference and optimization.