
Resources
Key references: (Tsai et al., 2019; Ramachandran et al., 2017; Chen et al., 2020; Dosovitskiy et al., 2020; Jetley et al., 2018)References
- Chen, T., Kornblith, S., Norouzi, M., Hinton, G. (2020). A simple framework for contrastive learning of visual representations.
- Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., et al. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
- Jetley, S., Lord, N., Lee, N., Torr, P. (2018). Learn To Pay Attention.
- Ramachandran, P., Zoph, B., Le, Q. (2017). Searching for Activation Functions.
- Tsai, Y., Bai, S., Yamada, M., Morency, L., Salakhutdinov, R. (2019). Transformer Dissection: A Unified Understanding of Transformer’s Attention via the Lens of Kernel.

