標題: 從生物文獻中萃取物件之關係
Relation Extraction from Biological Literature
作者: 游志銘
Chih-Ming Yu
梁婷
Tyne Liang
資訊科學與工程研究所
關鍵字: 資訊萃取;蛋白質;疾病;Information Extraction;Protein;Disease
公開日期: 2002
摘要: 在本論文中,提出一個資訊擷取系統應用在生物文獻上,以萃取生物關係並將之轉換成樣板型式。在資訊萃取部份,實做加權重的貝式分類器來將句子分成有關係、沒關係和模糊不清三種類別。同時使用詞典和名詞片語擷取器來挑選生物事件之候選的名詞片語。資訊萃取的核心技術在於所提出的經驗法則,可以結合連接詞以處理多重關係的抽取。本系統可以說是一個以經驗法則為基礎的系統。在關係抽取的實驗中,召回率為79.30%,而正確率85.61%。此外,我們額外收集100個句子來作關係抽取實驗,平均召回率為78.71%,而平均正確率為83%。
In this thesis, an information extraction (IE) system, PADIES, for biological literature is designed to extract biological relations and transfer them into templates. In the IE module, we implemented a weighted Naive Bayes classifier, so as to classify sentences into three classes: Yes, No, Ambiguous. We also use lexicons and a noun phrase (NP) chunker to extract noun phrases as arguments of events. The IE kernel part of the proposed system is based on a set of heuristic rules combined with conjunctives which can deal with multiple relation extraction. PADIES is a kind of rule-based IE system. In relation extraction experiment, our recall is 79.30% and precision is 85.61%. Besides, we collect 100 sentences to do relation extraction experiment. The average recall is 78.71% and the average precision is 83%.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT910394104
http://hdl.handle.net/11536/70271
Appears in Collections:Thesis