BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models (2023)

This work aims at making language models more relevant to model language acquisition in infants.

In particular, we advocate that language models should be:

trained on developmentally plausible corpora.
evaluated on appropriate benchmarks.

To this end, we propose a language-acquisition-friendly benchmark to evaluate written or spoken language models at the lexical and syntactic levels.

Examples of audio files used in our benchmark are available on this webpage.

Spot-the-word task

In this task, the model receives a word (✓) and a pseudo-word (✗) matched in phonotactic probabilities.

The model gets a score of 1 if it successfully assigns a higher probability to the real word than to the pseudo-word. It obtains a score of 0 otherwise.

✓ hello	✗ lello	✗ pello
✓ cookie	✗ cootie	✗ boodie

Grammatical acceptability judgment task

In this task, the model receives a grammatical (✓) and an ungrammatical (✗) sentence.

The model gets a score of 1 if it successfully assigns a higher probability to the grammatical sentence than to the ungrammatical one. It obtains a score of 0 otherwise.

✓ The angry dragon	✗ The dragon angry
✓ The prince needs the princess	✗ The prince need the princess