Full metadata record
DC FieldValueLanguage
dc.contributor.author蔡易儒en_US
dc.contributor.authorCai, Yih-Ruen_US
dc.contributor.author王逸如en_US
dc.contributor.authorWang, Yih-Ruen_US
dc.date.accessioned2015-11-26T01:02:15Z-
dc.date.available2015-11-26T01:02:15Z-
dc.date.issued2015en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT070260260en_US
dc.identifier.urihttp://hdl.handle.net/11536/127287-
dc.description.abstract論文所探討之主題有二,一是改進語言模型,而改進語言模型包含斷詞器之改進、文字正規化、選詞等等改進,而其中為降低語言模型之複雜度,選詞我們希望可以選出常用詞彙做訓練,並且剔除部分詞頻高但分佈不均之詞彙,訓練出來之語言模型將其轉成加權有限狀態轉換機再用於音節串辨認,相較於傳統辨認系統,加權有限狀態轉換機有模型小、辨認時間短等等優點,最後我們以辨認率及複雜度之高低評測語言模型之優劣。 另外,我們亦希望可從文字語料中汲取更深層的詞彙意義,即藉由並且由《廣義知網》中固有詞彙的資訊做為樣本點並藉由文字語料訓練詞向量來賦予每一詞彙(樣本點)一詞向量,並透過最近鄰居演算法求詞與詞之間的餘弦相似度,再對於每一新詞彙自動標記其語意,除標記語意之外,我們亦對於《廣義知網》中詞彙之資訊、標記之正確率等等做更深入的探討。zh_TW
dc.description.abstractIn this thesis, we mainly reaserch on two topics, one is to improve language models, including improving the parser, text normalization, lexicons and so on. We choose common words for training language models in order to reduce complexity, and discard some frequent but uncommon words. We convert language models into weighted finite state transducer and apply it to syllable sequence recognition, comparing to conventional recognition systems, the weighted finite state transducer is relatively small and efficient. Finally, we measure the performance of the language model by recognition rate and complexity. In addition, we hope to extract more and deeper information of words from the text corpus, that is, we extract some word information from E-HowNet and training text corpus to assign each word (training example) a word vector, finding cosine similarity bwtween words and applying K-nearest neightbors algorithm to labeling each word one or more semantics. Besides, we discuss word information, the accuracy of word semantic labeling .etc further.en_US
dc.language.isozh_TWen_US
dc.subject語言模型zh_TW
dc.subjectK最近鄰居演算法zh_TW
dc.subject廣義知網zh_TW
dc.subjectLanguage Modelen_US
dc.subjectKNNen_US
dc.subjectE-HowNeten_US
dc.title中文語言模型及語意概念自動標示zh_TW
dc.titleLanguage Model and Word Semantic Labeling in Chineseen_US
dc.typeThesisen_US
dc.contributor.department電信工程研究所zh_TW
Appears in Collections:Thesis