中文連續語音辨認後處理之進一步研究

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	張志豪	en_US
dc.contributor.author	Zhi Hao Zhang	en_US
dc.contributor.author	陳信宏	en_US
dc.contributor.author	Xin Hong Chen	en_US
dc.date.accessioned	2014-12-12T01:14:59Z	-
dc.date.available	2014-12-12T01:14:59Z	-
dc.date.issued	2008	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#GT009513559	en_US
dc.identifier.uri	http://hdl.handle.net/11536/38403	-
dc.description.abstract	本論文分成兩個部份，第一部分探討建立語言模型時所使用的文字資料庫的適用性，觀察文字資料庫的內容是否適合建立語言模型，刪除不適合的內容及更正錯誤文字，希望能提升整體的辨識率。第二部分是針對辨認結果，以有意義的長詞為目標，而非只和辨認用詞典中的詞比對，為此我們多考慮了二種構詞，包括數量複合詞及人名，結果辨認率下降許多，顯示原先辨識結果將許多有意義的這二類長詞辨識成意義不完整或錯誤的短詞。由於辨認用詞典無法包含所有構詞，我們因此嘗試將常被用來構成這些詞的一字詞或subword加入詞典，希望這些構詞被辨認成正確的短詞串，以便在未來經後處理產生正確構詞。實驗結果顯示以subword作為構詞成分較一字詞為佳。	zh_TW
dc.description.abstract	The thesis divided into two parts, one is to explore the applicability of the corpus to be used to build the language model, and to observe the contents of corpus whether fit to build the language model or not. We delete the misfit contents and correct the wrong words. We hope to promote the whole recognition rate. The second part is that aim at the recognizable result. We use the meaningful long term for goal, not the meaningless short term. For these, we consider two compound words that include determiner-measure compound and name entity . The result is that the recognition rate goes down a lot. That shows the recognizable result let many meaningful these two kinds of long term to recognize incomplete meaning or wrong short term. Because our recognition can not include all compound words, we try to put one length word or subword which are often used to compound these words into lexicon. We hope these compound wards can be recognized the correct strings of word, then it can produce the right compound words in the future. The experimental result is that the subword is better than the string of word to be the component of compound words.	en_US
dc.language.iso	zh_TW	en_US
dc.subject	中文連續語音辨認	zh_TW
dc.subject	Continuous Mandarin Speech Recognition	en_US
dc.title	中文連續語音辨認後處理之進一步研究	zh_TW
dc.title	A Further Study on Post-Processing of Continuous Mandarin Speech Recognition	en_US
dc.type	Thesis	en_US
dc.contributor.department	電信工程研究所	zh_TW
顯示於類別：	畢業論文

文件中的檔案：

355901.pdf

若為 zip 檔案，請下載檔案解壓縮後，用瀏覽器開啟資料夾中的 index.html 瀏覽全文。