標題: 中文連續語音辨認之進一步研究
A Further Study on Continuous Mandarin Speech Recognition
作者: 張獻文
陳信宏
電信工程研究所
關鍵字: 語言模型;語音辨識;POS;OOV;language model
公開日期: 2006
摘要: 在本論文中,可以分成兩個部份,第一個是探討詞典所收錄的內容,以人工的方式,觀察所收錄的詞是否合適,若有不適合的詞則刪除。另一方面,當我們做詞典精簡內容之後,延伸而來的問題是Out of Vocabulary(OOV)。我們從OOV rate的大小,觀察對辨識結果的影響。要解決OOV的問題,我們想出將詞做重新斷詞,長詞變短詞,使OOV rate下降,進而提高辨識率。第二部份則是POS based language model的建立,試著加入詞長的資訊,以及混和POS 和word的語言模型來達到辨識率改善。最後則是使用linear interpolation結合兩種不同型態的語言模型,使得我們能夠再進一步的改善辨識率。將最終的音節辨認率和基本系統相比,約提昇2.5%左右。
In this thesis, can be divided into two parts, one is to explore dictionary contains Observation contains words it appropriate, if not for the words were deleted. The other hand, when we do dictionary streamline content, A problem is Out of Vocabulary . From OOV rate, observation of the recognition rate . To solve the problem of OOV, we do find words to constantly re-term, long term change short term, OOV rate drop further enhance recognition rate. The second part is POS based language model build, try to join the long-term information and POS and word to the language model to improve the recognition rate. Another is the use of linear interpolation combining two different styles of language model, enables us to further improve the accuracy rate. To the final ,syllable recognition rate improve about 2.5%
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009413532
http://hdl.handle.net/11536/80795
顯示於類別:畢業論文


文件中的檔案:

  1. 353201.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。