中文語言模型及語意概念自動標示

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	蔡易儒	en_US
dc.contributor.author	Cai, Yih-Ru	en_US
dc.contributor.author	王逸如	en_US
dc.contributor.author	Wang, Yih-Ru	en_US
dc.date.accessioned	2015-11-26T01:02:15Z	-
dc.date.available	2015-11-26T01:02:15Z	-
dc.date.issued	2015	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#GT070260260	en_US
dc.identifier.uri	http://hdl.handle.net/11536/127287	-
dc.description.abstract	論文所探討之主題有二，一是改進語言模型，而改進語言模型包含斷詞器之改進、文字正規化、選詞等等改進，而其中為降低語言模型之複雜度，選詞我們希望可以選出常用詞彙做訓練，並且剔除部分詞頻高但分佈不均之詞彙，訓練出來之語言模型將其轉成加權有限狀態轉換機再用於音節串辨認，相較於傳統辨認系統，加權有限狀態轉換機有模型小、辨認時間短等等優點，最後我們以辨認率及複雜度之高低評測語言模型之優劣。另外，我們亦希望可從文字語料中汲取更深層的詞彙意義，即藉由並且由《廣義知網》中固有詞彙的資訊做為樣本點並藉由文字語料訓練詞向量來賦予每一詞彙(樣本點)一詞向量，並透過最近鄰居演算法求詞與詞之間的餘弦相似度，再對於每一新詞彙自動標記其語意，除標記語意之外，我們亦對於《廣義知網》中詞彙之資訊、標記之正確率等等做更深入的探討。	zh_TW
dc.description.abstract	In this thesis, we mainly reaserch on two topics, one is to improve language models, including improving the parser, text normalization, lexicons and so on. We choose common words for training language models in order to reduce complexity, and discard some frequent but uncommon words. We convert language models into weighted finite state transducer and apply it to syllable sequence recognition, comparing to conventional recognition systems, the weighted finite state transducer is relatively small and efficient. Finally, we measure the performance of the language model by recognition rate and complexity. In addition, we hope to extract more and deeper information of words from the text corpus, that is, we extract some word information from E-HowNet and training text corpus to assign each word (training example) a word vector, finding cosine similarity bwtween words and applying K-nearest neightbors algorithm to labeling each word one or more semantics. Besides, we discuss word information, the accuracy of word semantic labeling .etc further.	en_US
dc.language.iso	zh_TW	en_US
dc.subject	語言模型	zh_TW
dc.subject	K最近鄰居演算法	zh_TW
dc.subject	廣義知網	zh_TW
dc.subject	Language Model	en_US
dc.subject	KNN	en_US
dc.subject	E-HowNet	en_US
dc.title	中文語言模型及語意概念自動標示	zh_TW
dc.title	Language Model and Word Semantic Labeling in Chinese	en_US
dc.type	Thesis	en_US
dc.contributor.department	電信工程研究所	zh_TW
顯示於類別：	畢業論文