Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Hwang, SH | en_US |
dc.contributor.author | Chen, SH | en_US |
dc.contributor.author | Wang, YR | en_US |
dc.date.accessioned | 2014-12-08T15:27:33Z | - |
dc.date.available | 2014-12-08T15:27:33Z | - |
dc.date.issued | 1996 | en_US |
dc.identifier.isbn | 0-7803-3555-4 | en_US |
dc.identifier.uri | http://hdl.handle.net/11536/19787 | - |
dc.description.abstract | In this paper, the implementation of a high-performance Mandarin TTS system is presented. The system is composed of four main parts: text analysis (TA), prosodic information generation (PIG), waveform table (WT) of 411 base-syllables, and PSOLA-based waveform synthesis (PSOLA). In TA, a statistical model based method is first employed to automatically tag the input text to obtain the word sequence and the associated part-of-speech (POS) sequence. A lexicon containing about 80000 words is used in the tagging process. Then the corresponding base-syllable sequence is found and used to get from WT the basic wave-form sequence. Some linguistic features used in PIG are also extracted in TA, In PIG, a four-layer recurrent neural network (RNN) is employed to generate some prosodic information including pitch. contour, energy level, initial duration and final duration of syllable as well as inter-syllable pause duration. Finally, in PSOLA the basic waveform sequence is modified using the prosodic information to generate output synthetic speech, The whole system is implemented by software on a PC/AT 486 with a 16-bit Sound Blaster add-on card. Only 3.2 Mbyte memory space is required. It can synthesize speech in real-time for any input Chinese text. Informal listening tests by many native Chinese living in Taiwan confirmed that the synthetic speech sounded very fluent and natural. | en_US |
dc.language.iso | en_US | en_US |
dc.title | A Mandarin text-to-speech system | en_US |
dc.type | Proceedings Paper | en_US |
dc.identifier.journal | ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4 | en_US |
dc.citation.spage | 1421 | en_US |
dc.citation.epage | 1424 | en_US |
dc.contributor.department | 交大名義發表 | zh_TW |
dc.contributor.department | 電信工程研究所 | zh_TW |
dc.contributor.department | National Chiao Tung University | en_US |
dc.contributor.department | Institute of Communications Engineering | en_US |
dc.identifier.wosnumber | WOS:A1996BJ20B00358 | - |
Appears in Collections: | Conferences Paper |