About me

Jeppe Hallgren

Jeppe Hallgren

Studying Computer Science
@ University of Cambridge
Follow me on


Hessian-free optimization of Recurrent Neural Networks for Statistical Language Modeling

Recurrent Neural Networks are very powerful models for sequence prediction and currently hold the state-of-the-art performance on a number of modeling tasks. However, Recurrent Neural Networks are slow and difficult to train, making their application to many problems impractical. In this project we investigate two possible solutions for this. First we implement a novel second-order training method known as Hessian-free optimization and compare this to Gradient Descent. Secondly we propose a GPU implementation for efficiently computing gradients by porting parts of the Back Propagation Through Time algorithm to CUDA kernels. We are able to achieve significant speedups from both approaches when applying a Recurrent Neural Network to Language Modeling.

This page will be updated regularly as the project progresses.

Update March 25th 2015: Updated abstract above.

Update November 16th 2014: Unit testing has been set up and is now a part of build script. Results shown below.

Update November 1st 2014: Continuous integration (build and test automation) has now been set up. The performance of the network is now evaluated on each commit and results automatically pushed to the list below this.

Test RNN ID Dataset Commit Perplexity Perplexity baseline Perplexity combined Passing unit tests Timestamp