NEURAL-NETWORK-BASED F0 TEXT-TO-SPEECH SYNTHESIZER FOR MANDARINE

doi:10.1049/ip-vis:19941421

標題:	NEURAL-NETWORK-BASED F0 TEXT-TO-SPEECH SYNTHESIZER FOR MANDARINE
作者:	HWANG, SH CHEN, SH 電信工程研究所電信研究中心 Institute of Communications Engineering Center for Telecommunications Research
關鍵字:	MANDARINE SPEECH SYNTHESIZER;NEURAL NETWORKS
公開日期:	1-十二月-1994
摘要:	A neural-network-based approach to synthesising F0 information for Mandarin text-to-speech is discussed. The basic idea is to use neural networks to model the relationship between linguistic features, extracted from input text and parameters representing the pitch contour of syllables. Two MLPs are used to separately synthesise the mean and shape of pitch contour, using different linguistic features. A large set of utterances is employed to train these MLPs using the well known back-propagation algorithm. Pronunciation rules for generating F0 information are automatically learned and implicitly memorised by the MLPs. In the synthesis, parameters representing the mean and shape of the pitch contour of each syllable are generated using linguistic features extracted from the given input text. Simulation results confirmed that this is a promising approach for F0 synthesis. The resulting synthesised pitch contours of syllables match well with their original counterparts. Average root mean square errors of 0.94 ms/frame and 1.00ms/frame were achieved.
URI:	http://dx.doi.org/10.1049/ip-vis:19941421 http://hdl.handle.net/11536/2200
ISSN:	1350-245X
DOI:	10.1049/ip-vis:19941421
期刊:	IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING
Volume:	141
Issue:	6
起始頁:	384
結束頁:	390
顯示於類別：	期刊論文

文件中的檔案：

A1994QB09800005.pdf

若為 zip 檔案，請下載檔案解壓縮後，用瀏覽器開啟資料夾中的 index.html 瀏覽全文。