Sequences
Data streams are everywhere in our lives. Weather station sensor data arrive in streams indexed by time, financial trading data and obviously reading comprehension - one can think of many others. We are interested to fit sequenced data with a model and therefore we need a hypothesis set, that is rich enough for tasks that require some form of memory. In the following we use as the index variable without necessarily implying any time semantics. Dynamical systems are such rich models where the recurrent state evolution can be represented as: where is the evolving state, is an external action or control and is a set of parameters that specify the state evolution model . This innocent looking equation can capture quite a lot of complexity.- The state space which is the set of states can depend on .
- The action space similarly can depend on
- Finally, the function that maps previous states and actions to a new state can also depend on
RNN Architecture
The RNN architecture is a constrained implementation of the above dynamical system: RNNs implement the same function (parametrized by ) across the sequence . The state is latent and is denoted with to match the notation we used earlier for the DNN hidden layers. There is also no dependency on of the parameters and this means that the network shares parameters across the sequence. We have seen parameter sharing in CNNs as well but if you recall the sharing was over the relatively small span of the filter. But the most striking difference between CNNs and RNNs is in recursion itself.



