標題: | 以語料為基礎的中文專有名詞分類之研究 A Corpus-Based Chinese Named-Entity Classification |
作者: | 葉政輝 Cheng-Hui Yeh 梁婷 Tyne Liang 資訊科學與工程研究所 |
關鍵字: | 中文名詞;分類;Named-Entity;Classification |
公開日期: | 2002 |
摘要: | 專有名詞的分類在自然語言處理中屬於重要的一環,尤其是針對文件處理以及語意的了解上。正確的專有名詞識別在文件搜尋中不僅可以扮演索引詞彙,在語意上也可以藉此了解人物、事件、地點與時間等關係。本論文中,我們使用了中文字元機率模型,利用人名常見字元來解決中文人名分類的問題。此外,藉由相鄰共現雙詞彙模型以及前後詞類兩模型,將專有名詞前後常見詞彙與詞類標記整合使用來識別與分類中文人名與組織名稱。經過訓練後,在測試上中文人名可以達到89%的正確率與99%的召回率,而組織名稱上也有89%的正確率與84%的召回率。 Named-entity identification plays an important role in natural language processing, especially in document processing and message understanding. Named-entity can be a keyword on web or full-text retrieval. We can understand relationships among persons, events, locations, date or time in documents via correct named-entity identification. In this thesis, we use probabilities of characters used in common Chinese person names to retrieve Chinese person name. Furthermore, we propose co-occurring-neighbor word model and part-of-speech model to combine key terms and tagging information prior/posterior to named-entities. After training, we have 89% precision and 99% recall rate on Chinese person name classification experiments, 89% precision and 84% recall rate on organization classification experiments. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT910394002 http://hdl.handle.net/11536/70174 |
Appears in Collections: | Thesis |