|
The ASY Synthesis ProcessThe Articulatory Synthesis program (ASY) is a software speech synthesis system. At the heart of this system is a model of the vocal tract in the midsaggital plane (viewed from the side, as shown above). There are 6 key parameters in this model: the tongue body center (C, 2 degrees of freedom (df)), the tongue tip (T, 2 df), the jaw (J, 1 df), the lips (L, 2 df), the velum (V, 1 df), and the hyoid (H, 2 df,controlling larynx height and pharynx width). The tongue tip is a structure that rests on the tongue body, which is implemented as a ball. In turn, the tongue ball rests on the jaw. In the actual program, the vocal tract can be reconfigured by clicking on one of the articulators, and repositioning it. Once these parameters have been specified (either by providing them in a table of numbers, or by manipulating the midsaggital display using graphical tools), the tract is then converted into a series of uniform tube sections. To accomplish this, a grid is first imposed over the tract to aid in the process of determining cross-sectional areas. These sections are then normalized to a length of .875 cm, and rules (for different sections of the tract) are used to impose a third-dimension. This array of areas is then used to calculate the transfer function of the tract. Implementing a source-filter approach to sound production, a glottal voicing source (Rosenberg type B waveshape) is used to excite the tract (both the open and speed quotients of this source can be controlled). For the case of fricative noise sources, a simulation is accomplished by inserting a shaped noise component in front of the constriction in the tract. Tissue losses and glottal impedance are modelled with lumped parameters. Once the transfer function has been calculated, it is implemented as a digital filter, and sound can be produced as a file of digital values which are then converted to analog form using speech output hardware (12 bit, 10kHz or 20 kHz). Synthesis can be produced for either static tract shapes or for dynamic utterances. In the latter case, two tables are used to control the synthesis process. The first table (script) provides a specification of key tract shapes. The second table (control) provides timing and source information. Values in script tables are linearly interpolated, if necessary. Synthesis is on a pitch-pulse by pitch-pulse basis. The duration of a pulse is determined by the fundamental frequency specified in the control table. |