完整後設資料紀錄
DC 欄位語言
dc.contributor.authorLin, CTen_US
dc.contributor.authorWu, RCen_US
dc.contributor.authorChang, JYen_US
dc.contributor.authorLiang, SFen_US
dc.date.accessioned2014-12-08T15:39:39Z-
dc.date.available2014-12-08T15:39:39Z-
dc.date.issued2004-02-01en_US
dc.identifier.issn1083-4419en_US
dc.identifier.urihttp://dx.doi.org/10.1109/TSMCB.2003.811518en_US
dc.identifier.urihttp://hdl.handle.net/11536/27069-
dc.description.abstractIn this paper, a new technique for the Chinese text-to-speech (TTS) system is proposed. Our major effort focuses on the prosodic information generation. New methodologies for constructing fuzzy rules in a prosodic model simulating human's pronouncing rules are developed. The proposed Recurrent Fuzzy Neural Network (RFNN) is a multilayer recurrent neural network (RNN) which integrates a Self-constructing Neural Fuzzy Inference Network (SONFIN) into a recurrent connectionist structure. The RFNN can be functionally divided into two,parts. The first part adopts the SONFIN as a prosodic model to explore the relationship between high-level linguistic features and prosodic information based on fuzzy inference rules. As compared to conventional neural networks, the SONFIN can always construct itself with an economic network size in high learning speed. The second part employs a five-layer network to generate all prosodic parameters by directly using the prosodic fuzzy rules inferred from the first part as well as other important features of syllables. The TTS system combined with the proposed method can behave not only sandhi rules but also the other prosodic phenomena existing in the traditional TTS systems. Moreover, the proposed scheme can even find out some new rules about prosodic phrase structure. The performance of the proposed RFNN-based prosodic model is verified by imbedding it into a Chinese TTS system with a Chinese monosyllable database based on the time-domain pitch synchronous overlap add (TD-PSOLA) method. Our experimental results show that the proposed RFNN can generate proper prosodic parameters including pitch means, pitch shapes, maximum energy levels, syllable duration, and pause duration. Some synthetic sounds are on-line available for demonstration.en_US
dc.language.isoen_USen_US
dc.subjectChinese text-to-speech systemen_US
dc.subjectfuzzy inference engineen_US
dc.subjectprosodic informationen_US
dc.subjectrecurrent neural networken_US
dc.subjectsandhi rulesen_US
dc.subjectspeech synthesizeren_US
dc.titleA novel prosodic-information synthesizer based on recurrent fuzzy neural network for the Chinese TTS systemen_US
dc.typeArticleen_US
dc.identifier.doi10.1109/TSMCB.2003.811518en_US
dc.identifier.journalIEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICSen_US
dc.citation.volume34en_US
dc.citation.issue1en_US
dc.citation.spage309en_US
dc.citation.epage324en_US
dc.contributor.department電控工程研究所zh_TW
dc.contributor.departmentInstitute of Electrical and Control Engineeringen_US
dc.identifier.wosnumberWOS:000188464600029-
dc.citation.woscount7-
顯示於類別:期刊論文


文件中的檔案:

  1. 000188464600029.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。