Picking the Right Pre-trained Model
BERT, RoBERTa, DistilBERT, and more.
A Whole Family
BERT sparked a family of related models. Each one tweaks the recipe to trade speed, size, or accuracy differently.
RoBERTa: Trained Harder
RoBERTa keeps BERT's design but trains longer on more data and drops one task. The payoff is higher accuracy.
All lessons in this course
- Why Context Changes Word Meaning
- Masked Language Modeling
- Embedding Sentences With BERT
- Picking the Right Pre-trained Model