Using Transparent Machine Learning to study Human Speech
Machine learning, the use of nuanced computer models to analyze and predict data, has a long history in speech recognition and natural language processing, but has largely been limited to more applied, engineering tasks. This talk will describe two more research-focused applications of transparent machine learning algorithms in the study of speech perception and production.
For speech perception, we’ll examine the difficult problem of identifying acoustic cues to a complex phonetic contrast, in this case, vowel nasality. Here, by training machine learning algorithms on acoustic measurements, we can more directly measure the informativeness of the various acoustic features to the contrast. This by-feature informativeness data was then used to create hypotheses about human cue usage, and then, to model the observed human patterns of perception, showing that these models were able to predict not only the utilized cue, but the subtle patterns of perception arising from less informative changes.
For speech production, we’ll focus on data from Electromagnetic Articulography (EMA), which provides position data for the articulators with high temporal and spatial resolution, and discuss our ongoing efforts to identify and characterize pause postures (specific vocal tract configurations at prosodic boundaries, c.f. Katsika et al. 2014) in the speech of 7 speakers of American English. Here, the lip aperture trajectories of 800+ individual pauses were gold-standard annotated by a member of the research team, and then subjected to principal component analysis. These analyses were then used to train a support vector machine (SVM) classifier, which achieved a 96% classification accuracy in cross validation tests, with a Cohen’s Kappa showing machine-to-annotator agreement of 0.79, suggesting the potential for improvements in speed, consistency, and objective characterization of gestures.
These methods of modeling feature importance and classifying curves using transparent and interpretable machine learning both demonstrate concrete methods which are potentially useful and applicable to a variety of questions in phonetics, and potentially, in linguistics in general.