Lecture 11 :: Advanced Prediction Models for Business Applications

Lecture 11

L11 : Advanced NLP: Attention, BERT and Transformers

Lecture note
Attention Is All You Need and an annotated version
BERT and its repository
Illustrated BERT
Attention
Transformer Tutorial in Pytorch
SOTA Transformer Implementations including BERT and DistillBERT (for example BERT-base-uncased)

Lecture Goals

Be able to explain self-attention and how it differs from simpler attention mechanisms seen in sequence to sequence models
Be able to reason about keys, values and queries in self-attention
Be able to recall the key characteristics of BERT and how pre-trained models can be used for NLP tasks.