NLP Academy · Lesson

Scoring Generation With ROUGE and BLEU

Measure summary and translation quality.

Why Score Generation?

Summaries and translations are free-form text, so there is no single right answer. We need metrics to compare output against reference text. 📏

Both ROUGE and BLEU compare your generated text to one or more human references. More overlap with the reference means a higher score.