A STATISTICAL-MODEL BASED FUNDAMENTAL-FREQUENCY SYNTHESIZER FOR MANDARINE SPEECH

Full metadata record

DC Field	Value	Language
dc.contributor.author	CHEN, SH	en_US
dc.contributor.author	CHANG, S	en_US
dc.contributor.author	LEE, SM	en_US
dc.date.accessioned	2014-12-08T15:04:51Z	-
dc.date.available	2014-12-08T15:04:51Z	-
dc.date.issued	1992-07-01	en_US
dc.identifier.issn	0001-4966	en_US
dc.identifier.uri	http://hdl.handle.net/11536/3355	-
dc.description.abstract	A novel method based on a statistical model for the fundamental-frequency (F0) synthesis in Mandarin text-to-speech is proposed. Specifically, a statistical model is employed to determine the relationship between F0 contour patterns of syllables and linguistic features representing the context. Parameters of the model were empirically estimated from a large training set of sentential utterances. Phonologic rules are then automatically deduced through the training process and implicitly memorized in the model. In the synthesis process, contextual features are extracted from a given input text, and the best estimates of F0 contour patterns of syllable are then found by a Viterbi algorithm using the well-trained model. This method can be regarded as employing a stochastic grammar to reduce the number of candidates of F0 contour pattern at each decision point of synthesis. Although linguistic features on various levels of input text can be incorporated into the model, only some relevant contextual features extracted from neighboring syllables were used in this study. Performance of this method was examined by simulation using a database composed of nine repetitions of 112 declarative sentential utterances of the same text, all spoken by a single speaker. By closely examining the well-trained model, some evidence was found to show that the declination effect as well as several sandhi rules are implicitly contained in the model. Experimental results show that 77.56% of synthesized F0 contours coincide with the VQ-quantized counterpart of the original natural speech. Naturalness of the synthesized speech was confirmed by an informal listening test.	en_US
dc.language.iso	en_US	en_US
dc.title	A STATISTICAL-MODEL BASED FUNDAMENTAL-FREQUENCY SYNTHESIZER FOR MANDARINE SPEECH	en_US
dc.type	Article	en_US
dc.identifier.journal	JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA	en_US
dc.citation.volume	92	en_US
dc.citation.issue	1	en_US
dc.citation.spage	114	en_US
dc.citation.epage	120	en_US
dc.contributor.department	電信工程研究所	zh_TW
dc.contributor.department	電信研究中心	zh_TW
dc.contributor.department	Institute of Communications Engineering	en_US
dc.contributor.department	Center for Telecommunications Research	en_US
dc.identifier.wosnumber	WOS:A1992JD13400009	-
dc.citation.woscount	10	-
Appears in Collections:	Articles

Files in This Item:

A1992JD13400009.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.