Statistical learning models of early phonetic acquisition struggle with child-centered audio data (2022)
In this work, we trained self-supervised representation learning model (Contrastive Predictive Coding) on:
-
child-centered long-form recordings (acquired with child-worn microphones)
-
audiobooks (clean read-speech commonly used in self-supervised representation learning)
-
simulated long-forms, that are audiobooks contaminated with additive noise and reverberation.
Below, you’ll find some audio samples that the model is exposed to during training.
Child-centered long-form recordings
Audio samples extracted from child-centered long-form recordings:
Audiobooks
Audio samples extracted from audiobooks:
Simulated long-forms
The same audio samples contaminated with additive noise and reverberation to simulate the challenging acoustic conditions found in long-forms: