bear being in multiple grammatical patterns that also influence its meaning. For example, we have seen the subject-verb-object pattern and the subject-verb-adjective-subject pattern. To capture such multiplicities we can use multiple attention heads where each attention head learns a different pattern such as the three shown in figure below.


Attention Blocks with Skip Connections

References
- Andreas, J., Rohrbach, M., Darrell, T., Klein, D. (2015). Neural Module Networks.
- Dauphin, Y., Fan, A., Auli, M., Grangier, D. (2016). Language Modeling with Gated Convolutional Networks.
- Lu, J., Xiong, C., Parikh, D., Socher, R. (2016). Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning.
- Luong, M., Pham, H., Manning, C. (2015). Effective Approaches to Attention-based Neural Machine Translation.
- Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., et al. (2016). CNN-RNN: A Unified Framework for Multi-label Image Classification.

