Practical Deep Learning: Selecting the Right Model and Gathering Training Data Part I
Building deep learning applications in the real world is a never-ending process of selecting and refining the right elements of a specific solution. Among those elements, the selection of the correct model and the right structure of the training dataset are, arguably, the two most important decisions that data scientists need to make when architecting deep learning solutions. How to decide what deep learning model to use for a specific problem? How do we know whether we are using the correct training dataset or we should gather more data? Those questions are the common denominator across all stages of the lifecycle of a deep learning application. Even though there is no magic answer to those questions, there are several ideas that could guide your decision making process. Let’s start with the selection of the correct deep learning model.
Selecting a Baseline Model
The first thing to figure out when exploring an artificial intelligence(AI) problem is to determine whether its a deep learning problem or not. Many AI scenarios are perfectly addressable using basic machine learning algorithms. However, if the problem falls into the category of “AI-Complete” scenarios such as vision analysis, speech translation, natural language process or others of similar nature, then we need to start thinking about how to select the right deep learning model.
Identifying the correct baseline model for a deep learning problem is a complex task that can be segmented into two main parts:
I) Select the core learning algorithm.
II)Select the optimization algorithm that complements the algorithm selected on step 1.
Most deep learning algorithms are correlated to the structure of the training dataset. Again, there is no silver bullet for selecting the right algorithm for a deep learning problem but, some of the following design guidelines should help in the decision:
a) If the input dataset is based on images or similar topological structures, then the problem can be tackled using convolutional neural networks(CNNs)(see my previous articles about CNNs).
b)If the input is a fixed-size vector, we should be thinking of using a feed-forward network with inter layer connectivity.
c) If the input is sequential in nature, then we have a problem better suited for recurrent or recursive neural networks.
Those principles are mostly applicable to supervised deep learning algorithms. However, there are plenty of deep learning scenarios that can benefit from unsupervised deep learning models. In scenarios such as natural language processing or image analysis, using unsupervised learning models can be a useful technique to determine relevant characteristics of the input dataset and structure it accordingly.
In terms of the optimization algorithm, you can rarely go wrong using stochastic gradient descent(SGD)(read my previous article about SGD). Variations of SGD such as the ones using momentum or learning decay models are very popular in the deep learning space. Adam is, arguably, the most popular alternative to SGD algorithms especially when combined with CNNs.
Now we have an idea of how to select the right deep learning algorithm for a specific scenario. The next step is to validate the correct structure of the training dataset. We will discuss that in the next part of this article.