These notes heavily borrow from the CS224N set of notes on Language Models.
The need for neural language models

References
- Cho, K., Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., et al. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation.
- Kim, Y., Jernite, Y., Sontag, D., Rush, A. (2015). Character-Aware Neural Language Models.
- Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space.
- Vinyals, O., Fortunato, M., Jaitly, N. (2015). Pointer Networks.
- Wiseman, S., Rush, A. (2016). Sequence-to-Sequence Learning as Beam-Search Optimization.

