標題: 使用於中文自發性語音辨認之聲學模式及韻律模式
Acoustic Modeling and Prosody Modeling for Mandarin Spontaneous-Speech Recognition
作者: 吳聲鋒
Wu, Sheng-Feng
陳信宏
Chen, Sin-Horng
電信工程研究所
關鍵字: 自發性語音;語音辨認;Spontaneous Speech;Speech Recognition
公開日期: 2014
摘要:   本研究主要分成兩部分,分別探討中文自發性語音辨識的聲學模式(acoustic modeling)及韻律模式(prosody modeling)。由於自發性語音有許多朗讀式語音所沒有的特殊口語現象,針對這些現象,本研究在聲學模式部分,我們使用自發性語料庫同時訓練了包含正常語音及特殊語音的前後文相關三連音聲學模型(tri-phone HMM model);而在韻律模式的部分,我們使用過去所提出的階層式韻律模型(Hierarchical Prosodic Model, HPM)為基礎來設計適用於自發性語音的韻律模式,首先將流暢語段和非流暢語段分開來modeling,流暢語段的modeling units包含音節及syllable-like 的particles,非流暢語段的modeling units則包含產生遲疑、口吃、停頓等現象的particle、maker等。HPM共包含11個韻律子模型用來描述韻律聲學參數、韻律結構之韻律標記、語言參數之間的關係。針對建立好的HPM,我們探討各種特殊現象的韻律標記結果,發現這些特殊現象有一定的韻律變化特性,能夠用來協助做不流暢語段的偵測。最後本研究亦探討自發性語音韻律模型對聲學模式的影響,利用這些影響因素來建立韻律信息相依之聲學模型,以期能改善聲學模型對特殊口語現象的辨識率。
In this thesis, two main issues of Mandarin spontaneous-speech recognition are concerned: acoustic modeling and prosody modeling. In acoustic modeling, we use the MCDC database to construct context-dependent tri-phone HMM models for both parts of normal speech and particular speech which is composed of particles and paralinguistic events. In prosodic modeling, a spontaneous-speech prosodic model is constructed using the same idea of the Hierarchical Prosodic Model (HPM) for read Mandarin speech proposed previously. In the prosody modeling, normal speech segments and disfluent speech segments are separately modeled. The modeling units of the normal speech segments are syllables and syllable-like particles; while the units for the disfluent speech segments are hesitations, stutterings, particle pauses, makers, etc. There are in total 11 sub-models of the HPM constructed to describe the relationships of prosodic-acoustic features of utterance, prosody tags representing the prosodic structure of utterance, and linguistic features of the associated text. By analyzing the parameters of the well-trained HPM and the prosody-labelling results of all training utterances, we find that the characteristics of prosodic tags for disfluenct speech units are different from those of the normal speech units. This property may be used in ASR to assist in identifying the disfluent speech units. Lastly, the use of speech prosody in acoustic modeling is studied. A prosody-dependent acoustic model is constructed and will be used to improve the spontaneous-speech recognition in the future.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070160227
http://hdl.handle.net/11536/75615
Appears in Collections:Thesis