標題: 漢語多重語速平行語料庫之韻律研究
An Investigation on the Mandarin Prosody of a Parallel Multi-Speaking Rate Speech Corpus
作者: 湯政璋
Tang, Cheng-Chang
陳信宏
Chen, Sin-Horng
電信工程研究所
關鍵字: 漢語;語音;韻律;說話語速;Mandarin;Speech;Prosody;Speaking Rate
公開日期: 2008
摘要: 本研究在討論漢語多重語速的韻律,所採用的實驗語料是由一位專業的女性播報員所錄製的四種不同速度之平行語料庫,這四組語料庫的平均語速(SR)分別為快速語料庫每秒鐘4.40個音節、正常語速語料庫每秒鐘3.82個音節、中速語料庫每秒鐘2.97個音節和慢速語料庫每秒鐘2.45個音節。我們會採用交通大學語音實驗室發展的進階非監督式韻律標記與模式(A-PLM)法,建立我們需要的韻律模型,並根據此模型同時採用自動化的韻律標記與模擬,這個方法的好處在於藉由電腦模擬,我們可以處理大量語料,模擬樣本數量龐大可以讓研究結果更具一般性,另外由於我們不需要人工標記處理,研究結果會更具一致性。本論文會討論到各種不同的聲學參數、韻律標記以及韻律組成份子在不同說話語速語音的情況下之變化情形,在一些應用如文字轉語音系統(TTS)、自動語音辨認系統(ASR)中,未來如果要延伸到各種不同的說話速度上,本研究將可以提供有用的資訊。
In this thesis, the prosody of a parallel multi-speaking rate Mandarin read speech corpus is investigated. The corpus contains four parallel speech datasets uttered by a female professional announcer with various speech rates (SRs) of 4.40 (fast), 3.82 (normal), 2.97 (median) and 2.45 (slow) syllables/second. By using the advanced unsupervised joint prosody labeling and modeling (A-PLM) method proposed previously, the relationships between SR and various prosodic features, including pause duration, patterns of three high-level prosodic constituents, and the break labels, are investigated. The analyses reported in this study could be very informative in developing prosody generation mechanism for text-to-speech (TTS) and in prosody modeling for automatic speech recognition (ASR) in various SRs.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009413562
http://hdl.handle.net/11536/80823
顯示於類別:畢業論文