標題: 使用階層式語言模型之大詞彙國語辨認系統
Large-Vocabulary Mandarin Speech Recognition using Hierarchical Language Model
作者: 楊雲舒
Yang, Yun-Shu
王逸如
Wang, Yih-Ru
電信工程研究所
關鍵字: 大詞彙語音辨認;階層式語言模型;Large-Vocabulary Speech Recognition;Hierarchical Language Model;OOV
公開日期: 2010
摘要: 本論文針對中文詞彙中的定量複合詞、人名、綴詞,利用此三類所具有的規則特性將之拆解,以較少數量的構詞單元來涵蓋全部的三類詞彙,可以降低此三種詞類的OOV問題。有別於傳統上以"字"為單元來評估辨認率(character error rate)為主,本研究希望以較長且具有意義的詞彙或者詞組(Word Chunk)來作為語音辨認效能的評估;透過詞彙的行為特性,藉由語法與語意資訊為此三種詞類建立可更精細的描述它們的語言模型,重新配置語言模型分數來找出最佳的辨識結果,以提升辨識效能。 由結果所分析,本研究之方法確實能運用此三類詞之語言模型,全面性的描述該詞類的特性,藉此辨識出包含更多語意之詞彙甚至是詞組;往後將再利用詞組本身所具有的結構、語意及語法來得到更多的資訊,建構更有系統且豐富之方法來輔助辨認。
It’s difficult to list all words in recognizer’s vocabulary for large-vocabulary speech recognition, so we present an approach for modeling out of vocabulary (OOV) words. In this thesis, we choose three types of word in Mandarin such as determinative-measure compound word, person name and affixation to deal with this OOV problem. Words are converted to the sub-word units and searched for in the hypotheses to cover more new words through the use of flexible sub-word units. The main focus of this study is to use the grammar and semantic information to construct a hierarchical language model for these three types of word. The language model will be added to promote the recognition performance and hope to recognize more meaningful long-term units such as word and word-chunk.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079713543
http://hdl.handle.net/11536/44563
顯示於類別:畢業論文


文件中的檔案:

  1. 354301.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。