Recurrent Neural Networks are very powerful models for sequence prediction and currently hold the state-of-the-art performance on a number of modeling tasks. However, Recurrent Neural Networks are slow and difficult to train, making their application to many problems impractical. In this project we investigate two possible solutions for this. First we implement a novel second-order training method known as Hessian-free optimization and compare this to Gradient Descent. Secondly we propose a GPU implementation for efficiently computing gradients by porting parts of the Back Propagation Through Time algorithm to CUDA kernels. We are able to achieve significant speedups from both approaches when applying a Recurrent Neural Network to Language Modeling.
|Test||RNN ID||Dataset||Commit||Perplexity||Perplexity baseline||Perplexity combined||Passing unit tests||Timestamp|