Lecture Goals
- Be able to explain self-attention and how it differs from simpler attention mechanisms seen in sequence to sequence models
- Be able to reason about keys, values and queries in self-attention
- Be able to recall the key characteristics of BERT and how pre-trained models can be used for NLP tasks.