Adapting a Text-to-Speech Synthesizer to Convey User Identity
Lab Members: Rupal Patel, Shanqing Cai
Collaborators: Tim Bunnell (A.I. duPont Hospital for Children)
Supported by NSF Grant No. ITR-0712821.
The goal of this project is to advance computerized speech synthesis methods so that they can better approximate the unique vocal characteristics of individual human speakers. To date, even state-of-the art text-to-speech (TTS) synthesis cannot capture the flexibility of the natural human voice. While voice quality may not matter for many TTS applications, it is essential for assistive communication aids which are meant to be an extension of the user. Over two million Americans have severe speech and motor impairments that require the use of an assistive communication aid with TTS-based output.
Synthetic voices on commercially available devices are not representative of the user along basic dimensions such as age, gender, rate of speech, and voice quality thus drawing unnecessary attention and detracting from the spoken message as well as impeding social integration. This project aims to harness the residual vocal control in the productions of individuals with severe speech impairment in order to adapt a text-to-speech synthesizer such that the resultant voice resembles that of the user.
Jreige, C., Patel, R., & Bunnell, H. (2009). VocaliD: personalizing text-to-speech synthesis for individuals with severe speech impairment. SIGACCESS Conference on Computers and Accessibility, Pittsburgh, Pennsylvania, October 2009. PDF