完整後設資料紀錄
DC 欄位語言
dc.contributor.author施並格en_US
dc.contributor.authorPing-Ke Shihen_US
dc.contributor.author梁婷en_US
dc.contributor.authorTyne Liangen_US
dc.date.accessioned2014-12-12T02:04:23Z-
dc.date.available2014-12-12T02:04:23Z-
dc.date.issued2003en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT009123535en_US
dc.identifier.urihttp://hdl.handle.net/11536/52902-
dc.description.abstract一般而言,專有名詞的語意辨識是建立專業知識庫自動化過程的一項基本且重要的工作。此種語意辨識方法可以分為規則式與統計式兩種。在本篇論文,我們分別檢視這兩種方法在生物領域上的效果。規則式的方法以核心詞、功能詞、及已定義詞為基礎,配合詞性標記來辨識蛋白質名稱,再利用六條規則來提升系統的效能,實驗針對GENIA 及 SwissProt Reference語料作測試,規則式的系統分別可以達到52%、51%的F分數。統計式的方法利用萃取出的內部特徵、外部特徵、及全域特徵,以簡潔的馬可夫模型為基礎,並配合back-off的機率模型以解決資料稀疏的問題,實驗同樣針對GENIA 及 SwissProt Reference語料作測試,統計式的系統皆可以達到77%的F分數。除此之外,我們亦使用歸納的經驗法則來發掘出在變化詞中的省略詞彙,實驗結果可得到89%的求全率與69%的求準率。zh_TW
dc.description.abstractNamed Entity Recognition (NER) is an essential task of knowledge acquisition. Recently NER has been widely applied in biomedical entities extraction. In this thesis, we proposed automatic protein entities recognition based on rule-based and statistical approaches. Rule-based approach relies on core terms, function terms, predefined terms and Part-of-Speech tags. Then six rules are applied to boost performance. The experiments with GENIA and SwissProt Reference corpus, rule-based approach can yield 52% and 51% F-score respectively. Statistical approach is based on concise Hidden Markov Model, and back-off models are conducted to overcome data sparseness problem. We use not only internal, external, global features but also the result of rule-based approach to identify protein entities. Statistical approach can yield 77% F-score in both GENIA and SwissProt Reference corpus. Besides, we use heuristic rules to mine hiding named entities and expand them out of coordination variants. Term variants resolution system can yield 89% recall and 69% precision.en_US
dc.language.isoen_USen_US
dc.subject專有名詞辨識zh_TW
dc.subject生物醫學zh_TW
dc.subject馬可夫模型zh_TW
dc.subjectNamed Entity Extractionen_US
dc.subjectBiomedicalen_US
dc.subjectHidden Markov Modelen_US
dc.title生物語料中蛋白質名稱之自動辨識zh_TW
dc.titleAutomatic Protein Entities Recognition from PubMed Corpusen_US
dc.typeThesisen_US
dc.contributor.department資訊科學與工程研究所zh_TW
顯示於類別:畢業論文


文件中的檔案:

  1. 353501.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。