Speech-language pathologists (SLPs) provide speech therapy services to children and adults, improving all characteristics related to speech and oral functioning. SLPs can correct the most severe of impairments in less than three years, assuming the patient is motivated. Unfortunately SLPs can only guarantee motivation during weekly meetings that last one-hour. Outside of the office patients increasingly depend on their friends and family for encouragement and support. As therapy slows due to factors outside the SLP's control, patients risk joining the two-thirds of people who drop out of speech therapy.
This thesis discusses the development and evaluation of a suite of tools intended to address the dearth of technology plaguing modern speech therapy.
Applying a user-centered design approach I identify the primary stakeholders as the SLP and the patient, placing the support network of family and friends as secondary. Using this perspective I uncover the absent elements and develop tools to demonstrate both the value of each element along with the benefits of improved access. These tools introduce three novel technologies to speech therapy: accurate speech evaluation via automatic speech recognition (ASR), gamification of speech therapy exercises, and big data analysis for complex ASR output. When combined they tackle the primary bottleneck in speech therapy today: what occurs outside of the office.
In the first section, I collaborate with doctors at the University of California, Davis Medical Center. Together we identify the primary problem in speech therapy as progress outside of the office. We both agree that a properly calibrated ASR system would provide more consistent grading than support network members. Using this information I develop a speech recognition system on a mobile device that someone can tune to virtually any speech impairment. I tune the system for a lateral lisp and evaluate the system with an undergraduate student at the University of California, Santa Cruz. The system meets our gold standard by producing a 95% accuracy rate.
In the second section I embed the aforementioned ASR system into a game engine with the goal of producing motivating speech exercises. Through investigations with therapists and observations with children, I build a system that uses an interactive storybook game to automatically evaluate the speech characteristics of children with cleft palate. The system is successfully deployed in a clinical setting to children visiting a therapist for speech therapy evaluation. The system correctly evaluates 8 out of 9 children on par with a SLP, demonstrating ASR as a reliable speech evaluator. Interviews confirm that we have successfully created an experience that is as motivating as working with a live SLP in the short term.
In the third and final section I add big data analysis to the existing toolset in order to provide therapists effective feedback. I design with the goal of maximizing data visualization while minimizing mental and temporal demands. This results in a statistics engine that affords the therapist quantitative metrics at a glance. The speech recognition engine is modified to process speech mid-sentence, enabling real-time interactions. As this permits us to increase the rate of speech feedback, I adjust the game design from a storybook style game into a cycling series of mini-games in order to maximize speech rates in game. The system is distributed to therapists at UC Davis, and evaluated with various adult communities in Santa Cruz. We receive positive feedback from therapists on all aspects; with the tool providing both increased motivation in-office and increased visibility of tasks outside of the office. On the patient side of the work I effectively motivate 60% of adult speech therapy dropouts through the game environment. One participant demonstrates steady speech improvement through usage of the game. The SLP confirms that the improvement transfers to real speech scenarios, thus validating the design.