BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models (2023)
This work aims at making language models more relevant to model language acquisition in infants.
In particular, we advocate that language models should be:
- trained on developmentally plausible corpora.
- evaluated on appropriate benchmarks.
To this end, we propose a language-acquisition-friendly benchmark to evaluate written or spoken language models at the lexical and syntactic levels.
Examples of audio files used in our benchmark are available on this webpage.
Spot-the-word task
In this task, the model receives a word (✓) and a pseudo-word (✗) matched in phonotactic probabilities.
The model gets a score of 1 if it successfully assigns a higher probability to the real word than to the pseudo-word. It obtains a score of 0 otherwise.
Grammatical acceptability judgment task
In this task, the model receives a grammatical (✓) and an ungrammatical (✗) sentence.
The model gets a score of 1 if it successfully assigns a higher probability to the grammatical sentence than to the ungrammatical one. It obtains a score of 0 otherwise.