中文斷詞器之研究

Full metadata record

DC Field	Value	Language
dc.contributor.author	唐大任	en_US
dc.contributor.author	Da-Ren Tang	en_US
dc.contributor.author	王逸如	en_US
dc.contributor.author	Yih-Ru Wang	en_US
dc.date.accessioned	2014-12-12T02:28:32Z	-
dc.date.available	2014-12-12T02:28:32Z	-
dc.date.issued	2001	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#NT900435069	en_US
dc.identifier.uri	http://hdl.handle.net/11536/68947	-
dc.description.abstract	在本論文中，我們探討了斷詞器製作時的一些問題。首先利用斷詞規則與構詞規則配合詞庫，來幫助斷詞器斷詞，同時建立詞類雙連文模型，用以標記每個詞的詞類。在複合詞方面，由於定量複合詞與四字疊詞具有規律，因此我們利用構詞規則來結合，再使用斷詞規則挑選詞庫中的詞彙，或此複合詞。另外，若在輸出的詞串中有可結合的接頭/尾詞，我們則藉由規則將之與後/前面的詞彙結合成衍生詞。利用中研院提供的平衡語料庫，當作測試語料，幫助瞭解斷詞器性能。觀察斷詞結果，可發現我們結合出的長詞多比平衡語料庫還長，且我們認為結合的長詞是合理的，加上斷詞結果與平衡語料庫一致部分，斷詞器的正確率約達96%；其餘不正確處，則多是專有名詞與詞庫收錄未完備造成。至於詞類標記的正確性初步觀察還不錯，尚需適合的測試語料來更精確地測量。	zh_TW
dc.description.abstract	In this thesis, the parser for Chinese was studied. A parser is used to identify the words and their associated part of speech (POS) in a Chinese sentence. In our parser, the word matching rules proposed by the Chinese knowledge Information Processing group (CKIP), Academia Sinica; and the word combination rules for compounds were used. First, in the word matching unit, the first word in word chunk with the maximal length and the most plausible will be selected. Then, the word combination rules－determinative-measure(DM) compound and reduplication rules can be used to group the words into compound . In the thesis, there were done before the word matching in order to solve some ambiguities in the word matching unit. A prefix/suffix word construction rules were also used for post-processing, which can further construct the words into a derive word. Finally, the POS bigram model was used to determine the POS of output words in parser. The Sinica Corpus published by CKIP was used to evaluate the performance of out system; and the average word length of our system was larger than that done by CKIP parser. The result of our parser was more suitable for a speech synthesis system.	en_US
dc.language.iso	zh_TW	en_US
dc.subject	斷詞器	zh_TW
dc.subject	Parser	en_US
dc.title	中文斷詞器之研究	zh_TW
dc.title	A Study of Chinese Parser	en_US
dc.type	Thesis	en_US
dc.contributor.department	電信工程研究所	zh_TW
Appears in Collections:	Thesis