The process of building deep learning systems in the real world is full of theoretical and practical challenges. On the theoretical front, data scientists are constantly obsessed about selecting the right model for a specific scenario as well as the correct regularization and optimization techniques. In more practical terms, deep learning applications need to run and scale on the right infrastructure and be able to sustain its performance as it grows.
There are many techniques that can be applied in order to architect deep learning systems that can grow sustainably. Among those, dynamic structures have become particularly popular in the deep learning community.
The concepts behind dynamic structure models might seem trivial but they are really difficult to implement in deep neural networks. Essentially, dynamic structure models are able to execute sub-networks of a model based on the input dataset/ Many deep learning experts also refer to dynamic structure using the term conditional computation which better highlights the nature of the technique.
Deep learning architectures based on dynamic structure are particularly popular in ensemble learning systems that include various networks in the same model(see my previous articles about ensemble learning). Because dynamic structure architectures can predictably execute subsets of the network based on the input, they are relatively easier to scale as the size of the model grows. Specific implementation of dynamic structure have evolved across different areas of deep learning. An analogy to help you think about dynamic structure is think of an student taking different subjects in a semester. Depending on the specific class that the student is planning to attend, he or she will brush up on recent lessons and try to focus the attention on that specific topic. Dynamic structure tries to accomplish something similar for large deep learning models.
With the rapid rapid in popularity of deep learning architectures, several dynamic structure methods such as cascade classifiers or expert networks have become increasingly popular among deep learning practitioners.
Cascade of Classifiers
The cascade of classifiers strategy is typically applied to scenarios in which we need to detect complex patterns in a dataset such as strange objects in a an image. One obvious way to solve that problem is to implement a highly sophisticated classifier which is typically very expensive to run. Alternatively, a model can run a sequence of inexpensive classifiers that can asset that an image does no contain the rare object. The sequence of classifiers can be orchestrated in a way that the most basic classifiers run first while the ones with higher precision are executed at the end which improves the costs even more.
Mixture of Experts
Another popular dynamic structure method is known as mixture of experts and uses a neural network called the gater to determine what expert network to use based on the characteristics o the input. The selection of the expert network is based on probabilistic models that assign weights to every network. In the case that the model has a large number of expert networks, there are variations of the mixture of experts that use combinatorial gaters that are able to use a more complex selection criteria.