BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models (2023)

This work aims at making language models more relevant to model language acquisition in infants.

In particular, we advocate that language models should be:

  1. trained on developmentally plausible corpora.
  2. evaluated on appropriate benchmarks.

To this end, we propose a language-acquisition-friendly benchmark to evaluate written or spoken language models at the lexical and syntactic levels.

Examples of audio files used in our benchmark are available on this webpage.

Spot-the-word task

In this task, the model receives a word (✓) and a pseudo-word (✗) matched in phonotactic probabilities.

The model gets a score of 1 if it successfully assigns a higher probability to the real word than to the pseudo-word. It obtains a score of 0 otherwise.

✓ hello
✗ lello
✗ pello
✓ cookie
✗ cootie
✗ boodie

Grammatical acceptability judgment task

In this task, the model receives a grammatical (✓) and an ungrammatical (✗) sentence.

The model gets a score of 1 if it successfully assigns a higher probability to the grammatical sentence than to the ungrammatical one. It obtains a score of 0 otherwise.

✓ The angry dragon
✗ The dragon angry
✓ The prince needs the princess
✗ The prince need the princess