NLP Academy · Lesson

Picking the Right Pre-trained Model

BERT, RoBERTa, DistilBERT, and more.

A Whole Family

BERT sparked a family of related models. Each one tweaks the recipe to trade speed, size, or accuracy differently.

RoBERTa keeps BERT's design but trains longer on more data and drops one task. The payoff is higher accuracy.