Designing deep neural networks is an art that often involves an expensive
search over candidate architectures. To overcome this for recurrent neural nets
(RNNs), we establish a connection between the hidden state dynamics in an RNN
and gradient descent (GD). We then integrate momentum into this framework and
propose a new family of RNNs, called {\em MomentumRNNs}. We theoretically prove
and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient
issue in training RNNs. We study the momentum long-short term memory
(MomentumLSTM) and verify its advantages in convergence speed and accuracy over
its LSTM counterpart across a variety of benchmarks. We also demonstrate that
MomentumRNN is applicable to many types of recurrent cells, including those in
the state-of-the-art orthogonal RNNs. Finally, we show that other advanced
momentum-based optimization methods, such as Adam and Nesterov accelerated
gradients with a restart, can be easily incorporated into the MomentumRNN
framework for designing new recurrent cells with even better performance. The
code is available at https://github.com/minhtannguyen/MomentumRNN.