Recurrent and Recursive Networks in Deep Learning Systems Part II: Bidirectional and Deep RNNs
This is the second part of an essay that explores recurrent and recursive neural networks(RNNs) in deep learning models. In the first part, we introduced recurrent neural networks as an architecture to process sequential data in a similar way as convolutional neural networks(CNNs) are used to handle multi-dimensional data structures. Today, I would like to discuss some variations of recurrent networks that have become really popular in real world deep learning systems.
In the last few years, recurrent neural networks have become one of the main architectures in deep learning models. However, despite its popularity, pure recurrent networks often result limited in order to address many real world deep learning scenarios. The lack of recursive connections or backward feedback loops regularly challenge recurrent neural networks implementations. To address those challenges, researchers have created variations of recurrent neural networks that have been widely implemented in popular open source deep learning frameworks. Among those, bidirectional and deep RNNs are often used in more sophisticated scenarios that deal with sequential data.
The use case for bidirectional recurrent neural networks is centered on scenarios in which the state of a node is affected by the state of nodes executed at a future time. The traditional RNN architecture is based on a very simple computation graph in which the state of the network at any given time is based solely on information about the past. Now let’s take a simple speech recognition scenario in which the final analysis of an audio stream at any given time depends on the interpretation of a future segment of the audio stream. Suppose that a digital assistant inquires about your latest experience at the movies by asking you “how was the movie?” to what you answer “Well…” indicating a level of uncertainty. However, the final analysis will depend on your next statement.
Bidirectional RNNs address the future-dependency limitations of traditional recurrent networks by combining two RNNs in the same model. The first RNN moves forward through time from the beginning of the network while the second RNN moves backward starting at the end of a specific sequence. This simple adaptation of traditional RNNs allow any hidden unit to compute knowledge that depends both on the past and the future relative to a specific time window.
Deep Recurrent Networks
Traditional RNNs are represented using a very basic computation graph that connects the input unit to a sequence hidden units and the final hidden unit to the output unit. In that model, the computations performed by any of the hidden units have to be based on atomic transformations which often result insufficient to build more sophisticated data manipulation routines. Deep recurrent networks address that challenge by decomposing the state of a unit into a multi-layer network capable of performing arbitrarily complex operations.
The addition of depth to specific units directly expands the richness of its knowledge representation. However, is not as trivial as it sounds. Deep recurrent networks can have a negative impact by hurting the learning performance or making optimization more difficult.
Bidirectional and deep recurrent networks are two of the most popular forms of RNNs that you will find in deep learning stacks. In the next part of this article we will cover recursive neural networks as another technique to consider when processing sequential datasets.