In this dissertation I investigated, by using coarticulatory /u/-fronting in the alveolar context for a case study, how native speakers of American English produce coarticulatory variations and how they perceive and reproduce continuously varying speech sounds that are heard in coarticulatory and non-coarticulatory contexts.
The production study addressed the question of whether in American English coarticulatory fronting of /u/ in alveolar contexts is an inevitable consequence of production constraints or if it is produced by active speaker control. The study found that: (1) the relative acoustic difference between the fronted /u/ and the non-fronted /u/ remained across an elicited range of vowel duration; and (2) the degree of acoustic variability was less for the fronted /u/ than the non-fronted /u/. These results indicate that speakers of American English have a distinct and more narrowly specified articulatory target for the fronted /u/ in the alveolar context than for the non-fronted /u/.
The perception study addressed the issue of individual variation and compensation for coarticulation. The study found within-subject consistency in classification of /CVC/ stimuli both in compensatory and non-compensatory contexts. The study found no evidence for a within-subject perception-production link, but did find positive evidence for the relationship between linguistic experience and speech perception--the similarity between the distributional characteristics of the fronted and the non-fronted variants of /u/ in production data (a proxy for ambient language data) and the ranges of variation in perceptual responses toward /CVC/ stimuli in the fronting and the non-fronting contexts. Together, these results suggest that the source of individual variation in speech perception is the differences in the phonological grammar (perceptual category boundary) that guide speech perception, and that this perception grammar emerges in response to the ambient language data.
Finally, the vowel repetition study examined how perceptual compensation for coarticulation and individual differences in speech perception affect vowel repetition performance. This study found that: (1) ambiguous vowels were repeated with a significantly lower F2 when the vowels were heard in the fronting context than in the non-fronting context; (2) a given stimulus was repeated by some listeners un-ambiguously as the vowel belonging to the speaker's /i/ category for all trials, yet the same stimulus was repeated by other listeners un-ambiguously as vowels belonging to that speaker's /u/ category for all trials; and (3) the perceptual category boundary was a significant predictor for the repeated vowel's F2 value. Based on these results, it was hypothesized that one source of pronunciation variation in a given community is individual variation in speech perception that contributes variable mental representations across listeners when they encounter ambiguous speech.
One general pattern that was found in all experiments was vowel-specific variability: responses to /i/ were less variable than responses to /u/ in a production task, and /i/-like stimuli were repeated less variably than /u/-like stimuli in a vowel repetition task. Similarly, between /u/ in fronting and non-fronting contexts, /u/ elicited less variability in the fronting context than in the non-fronting context consistently in the production, perception, and vowel repetition tasks. More broadly, I contend that speech forms a dynamic system, characterized by mutual dependency and multiple causal loops between and among speech perception, speech production, knowledge about pronunciation norm, and ambient language data. These properties in language use govern the output of communicative interactions among members in a speech community, and one such output is member's knowledge of multiple sub-phonemic pronunciation categories that exist in any speech community. Additionally, I argue that any speech community is in a constant state of readiness to respond to an innovative pronunciation as a new community norm, because members have a variable but rich pronunciation repertoire even when there is no observable community-level sound change.