基於條件隨機場之中文樹狀結構標記

Full metadata record

DC Field	Value	Language
dc.contributor.author	康富傑	en_US
dc.contributor.author	Kang, Fu-Jie	en_US
dc.contributor.author	王逸如	en_US
dc.contributor.author	Wang, Yih-Ru	en_US
dc.date.accessioned	2015-11-26T01:02:15Z	-
dc.date.available	2015-11-26T01:02:15Z	-
dc.date.issued	2015	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#GT070260326	en_US
dc.identifier.uri	http://hdl.handle.net/11536/127286	-
dc.description.abstract	中文句法樹狀結構剖析在中文的自然語言處理上是非常重要的工作，在中文裡，詞為有意義的最小語言單位，中文的句子是由多個詞所組成，對於詞與詞之間該如何連接、哪些詞又需要優先被連接即為結構剖析的工作。近年來的研究傾向於使用機器學習的方式來進行中文斷詞及剖析，傳統的中文結構樹對於樹狀結構的標示約能達到70%左右的F-measure，在本研究中的第一部分使用條件隨機場來進行中文句法結構的訓練及標記，所用的訓練語料為使用中研院詞庫小組剖析系統標記的剖析結果，由於剖析結果中並不是完全正確，將部分錯誤的剖析結果經過人工修改後，使用條件隨機場進行模型訓練及標記，對測試語料的結果評估可以達到80%以上的F-measure。由過去文獻中顯示，中文句法結構及中文語音韻律結構有一定程度的關係，本研究的第二部分根據停頓時長大小定義一停頓韻律樹，並使用與第一部分相同的機器學習方式來標記停頓韻律結構樹，標記結果顯示，對於較容易判別的長停頓B3、B4分別能達到57.80%及81.25%的正確率，而較難判定的短停頓B2-2則僅有35.54%。	zh_TW
dc.description.abstract	In Chinese Natural language processing (NLP), syntax tree structure parsing is an important topic. The smallest meaningful unit is a word in Mandarin. Besides, a Chinese sentence is composed by many words. Thus, how to connect the words and which need to be connected at first is the role of parsing. Recent studies tend to use machine learning to parsing. The traditional Chinese parsing can almost achieved 70% F-measure. In our system, we train and label tree structure by Conditional random field (CRF). Training data use the parsing result by CKIP parser. We correct the parsing result which is not identical before model training. Using the CRF-based model to label testing data can achieve over 80% F-measure. In past work, Chinese Mandarin syntax is always in connection with Mandarin prosody. So, we defined a Prosodic Break tree by pause duration between words. Then label the break tree in the same method with syntax tree. We can achieve 57.80% and 81.25% correct rate to the long pause B3 and B4. And only 35.54% to the short pause B2-2.	en_US
dc.language.iso	zh_TW	en_US
dc.subject	條件隨機場	zh_TW
dc.subject	中文句法結構樹	zh_TW
dc.subject	停頓標記	zh_TW
dc.subject	Conditional Random Field	en_US
dc.subject	Chinese syntax tree	en_US
dc.subject	Break prediction	en_US
dc.title	基於條件隨機場之中文樹狀結構標記	zh_TW
dc.title	A Conditional Random Field-based Chinese Tree Structure Labeling	en_US
dc.type	Thesis	en_US
dc.contributor.department	電信工程研究所	zh_TW
Appears in Collections:	Thesis