标题: 使用于中文自发性语音辨认之声学模式及韵律模式
Acoustic Modeling and Prosody Modeling for Mandarin Spontaneous-Speech Recognition
作者: 吴声锋
Wu, Sheng-Feng
陈信宏
Chen, Sin-Horng
电信工程研究所
关键字: 自发性语音;语音辨认;Spontaneous Speech;Speech Recognition
公开日期: 2014
摘要:   本研究主要分成两部分,分别探讨中文自发性语音辨识的声学模式(acoustic modeling)及韵律模式(prosody modeling)。由于自发性语音有许多朗读式语音所没有的特殊口语现象,针对这些现象,本研究在声学模式部分,我们使用自发性语料库同时训练了包含正常语音及特殊语音的前后文相关三连音声学模型(tri-phone HMM model);而在韵律模式的部分,我们使用过去所提出的阶层式韵律模型(Hierarchical Prosodic Model, HPM)为基础来设计适用于自发性语音的韵律模式,首先将流畅语段和非流畅语段分开来modeling,流畅语段的modeling units包含音节及syllable-like 的particles,非流畅语段的modeling units则包含产生迟疑、口吃、停顿等现象的particle、maker等。HPM共包含11个韵律子模型用来描述韵律声学参数、韵律结构之韵律标记、语言参数之间的关系。针对建立好的HPM,我们探讨各种特殊现象的韵律标记结果,发现这些特殊现象有一定的韵律变化特性,能够用来协助做不流畅语段的侦测。最后本研究亦探讨自发性语音韵律模型对声学模式的影响,利用这些影响因素来建立韵律信息相依之声学模型,以期能改善声学模型对特殊口语现象的辨识率。
In this thesis, two main issues of Mandarin spontaneous-speech recognition are concerned: acoustic modeling and prosody modeling. In acoustic modeling, we use the MCDC database to construct context-dependent tri-phone HMM models for both parts of normal speech and particular speech which is composed of particles and paralinguistic events. In prosodic modeling, a spontaneous-speech prosodic model is constructed using the same idea of the Hierarchical Prosodic Model (HPM) for read Mandarin speech proposed previously. In the prosody modeling, normal speech segments and disfluent speech segments are separately modeled. The modeling units of the normal speech segments are syllables and syllable-like particles; while the units for the disfluent speech segments are hesitations, stutterings, particle pauses, makers, etc. There are in total 11 sub-models of the HPM constructed to describe the relationships of prosodic-acoustic features of utterance, prosody tags representing the prosodic structure of utterance, and linguistic features of the associated text. By analyzing the parameters of the well-trained HPM and the prosody-labelling results of all training utterances, we find that the characteristics of prosodic tags for disfluenct speech units are different from those of the normal speech units. This property may be used in ASR to assist in identifying the disfluent speech units. Lastly, the use of speech prosody in acoustic modeling is studied. A prosody-dependent acoustic model is constructed and will be used to improve the spontaneous-speech recognition in the future.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070160227
http://hdl.handle.net/11536/75615
显示于类别:Thesis