完整後設資料紀錄
DC 欄位語言
dc.contributor.author康富傑en_US
dc.contributor.authorKang, Fu-Jieen_US
dc.contributor.author王逸如en_US
dc.contributor.authorWang, Yih-Ruen_US
dc.date.accessioned2015-11-26T01:02:15Z-
dc.date.available2015-11-26T01:02:15Z-
dc.date.issued2015en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT070260326en_US
dc.identifier.urihttp://hdl.handle.net/11536/127286-
dc.description.abstract中文句法樹狀結構剖析在中文的自然語言處理上是非常重要的工作,在中文裡,詞為有意義的最小語言單位,中文的句子是由多個詞所組成,對於詞與詞之間該如何連接、哪些詞又需要優先被連接即為結構剖析的工作。近年來的研究傾向於使用機器學習的方式來進行中文斷詞及剖析,傳統的中文結構樹對於樹狀結構的標示約能達到70%左右的F-measure,在本研究中的第一部分使用條件隨機場來進行中文句法結構的訓練及標記,所用的訓練語料為使用中研院詞庫小組剖析系統標記的剖析結果,由於剖析結果中並不是完全正確,將部分錯誤的剖析結果經過人工修改後,使用條件隨機場進行模型訓練及標記,對測試語料的結果評估可以達到80%以上的F-measure。由過去文獻中顯示,中文句法結構及中文語音韻律結構有一定程度的關係,本研究的第二部分根據停頓時長大小定義一停頓韻律樹,並使用與第一部分相同的機器學習方式來標記停頓韻律結構樹,標記結果顯示,對於較容易判別的長停頓B3、B4分別能達到57.80%及81.25%的正確率,而較難判定的短停頓B2-2則僅有35.54%。zh_TW
dc.description.abstractIn Chinese Natural language processing (NLP), syntax tree structure parsing is an important topic. The smallest meaningful unit is a word in Mandarin. Besides, a Chinese sentence is composed by many words. Thus, how to connect the words and which need to be connected at first is the role of parsing. Recent studies tend to use machine learning to parsing. The traditional Chinese parsing can almost achieved 70% F-measure. In our system, we train and label tree structure by Conditional random field (CRF). Training data use the parsing result by CKIP parser. We correct the parsing result which is not identical before model training. Using the CRF-based model to label testing data can achieve over 80% F-measure. In past work, Chinese Mandarin syntax is always in connection with Mandarin prosody. So, we defined a Prosodic Break tree by pause duration between words. Then label the break tree in the same method with syntax tree. We can achieve 57.80% and 81.25% correct rate to the long pause B3 and B4. And only 35.54% to the short pause B2-2.en_US
dc.language.isozh_TWen_US
dc.subject條件隨機場zh_TW
dc.subject中文句法結構樹zh_TW
dc.subject停頓標記zh_TW
dc.subjectConditional Random Fielden_US
dc.subjectChinese syntax treeen_US
dc.subjectBreak predictionen_US
dc.title基於條件隨機場之中文樹狀結構標記zh_TW
dc.titleA Conditional Random Field-based Chinese Tree Structure Labelingen_US
dc.typeThesisen_US
dc.contributor.department電信工程研究所zh_TW
顯示於類別:畢業論文