Statistical learning models of early phonetic acquisition struggle with child-centered audio data (2022)
In this work, we trained self-supervised representation learning model (Contrastive Predictive Coding) on:
child-centered long-form recordings (acquired with child-worn microphones)
audiobooks (clean read-speech commonly used in self-supervised representation learning)
simulated long-forms, that are audiobooks contaminated with additive noise and reverberation.
Below, you’ll find some audio samples that the model is exposed to during training.
Child-centered long-form recordings
Audio samples extracted from child-centered long-form recordings:
Audio samples extracted from audiobooks:
The same audio samples contaminated with additive noise and reverberation to simulate the challenging acoustic conditions found in long-forms: