This work aims at making language models more relevant to model language acquisition in infants.
In particular, we advocate that language models should be:
- trained on developmentally plausible corpora.
- evaluated on appropriate benchmarks.
To this end, we propose a language-acquisition-friendly benchmark to evaluate written or spoken language models at the lexical and syntactic levels.
Examples of audio files used in our benchmark are available on this webpage.
In this task, the model receives a word (✓) and a pseudo-word (✗) matched in phonotactic probabilities.
The model gets a score of 1 if it successfully assigns a higher probability to the real word than to the pseudo-word. It obtains a score of 0 otherwise.
Grammatical acceptability judgment task
In this task, the model receives a grammatical (✓) and an ungrammatical (✗) sentence.
The model gets a score of 1 if it successfully assigns a higher probability to the grammatical sentence than to the ungrammatical one. It obtains a score of 0 otherwise.