生物語料中蛋白質名稱之自動辨識

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	施並格	en_US
dc.contributor.author	Ping-Ke Shih	en_US
dc.contributor.author	梁婷	en_US
dc.contributor.author	Tyne Liang	en_US
dc.date.accessioned	2014-12-12T02:04:23Z	-
dc.date.available	2014-12-12T02:04:23Z	-
dc.date.issued	2003	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#GT009123535	en_US
dc.identifier.uri	http://hdl.handle.net/11536/52902	-
dc.description.abstract	一般而言，專有名詞的語意辨識是建立專業知識庫自動化過程的一項基本且重要的工作。此種語意辨識方法可以分為規則式與統計式兩種。在本篇論文，我們分別檢視這兩種方法在生物領域上的效果。規則式的方法以核心詞、功能詞、及已定義詞為基礎，配合詞性標記來辨識蛋白質名稱，再利用六條規則來提升系統的效能，實驗針對GENIA 及 SwissProt Reference語料作測試，規則式的系統分別可以達到52%、51%的F分數。統計式的方法利用萃取出的內部特徵、外部特徵、及全域特徵，以簡潔的馬可夫模型為基礎，並配合back-off的機率模型以解決資料稀疏的問題，實驗同樣針對GENIA 及 SwissProt Reference語料作測試，統計式的系統皆可以達到77%的F分數。除此之外，我們亦使用歸納的經驗法則來發掘出在變化詞中的省略詞彙，實驗結果可得到89%的求全率與69%的求準率。	zh_TW
dc.description.abstract	Named Entity Recognition (NER) is an essential task of knowledge acquisition. Recently NER has been widely applied in biomedical entities extraction. In this thesis, we proposed automatic protein entities recognition based on rule-based and statistical approaches. Rule-based approach relies on core terms, function terms, predefined terms and Part-of-Speech tags. Then six rules are applied to boost performance. The experiments with GENIA and SwissProt Reference corpus, rule-based approach can yield 52% and 51% F-score respectively. Statistical approach is based on concise Hidden Markov Model, and back-off models are conducted to overcome data sparseness problem. We use not only internal, external, global features but also the result of rule-based approach to identify protein entities. Statistical approach can yield 77% F-score in both GENIA and SwissProt Reference corpus. Besides, we use heuristic rules to mine hiding named entities and expand them out of coordination variants. Term variants resolution system can yield 89% recall and 69% precision.	en_US
dc.language.iso	en_US	en_US
dc.subject	專有名詞辨識	zh_TW
dc.subject	生物醫學	zh_TW
dc.subject	馬可夫模型	zh_TW
dc.subject	Named Entity Extraction	en_US
dc.subject	Biomedical	en_US
dc.subject	Hidden Markov Model	en_US
dc.title	生物語料中蛋白質名稱之自動辨識	zh_TW
dc.title	Automatic Protein Entities Recognition from PubMed Corpus	en_US
dc.type	Thesis	en_US
dc.contributor.department	資訊科學與工程研究所	zh_TW
顯示於類別：	畢業論文

文件中的檔案：

353501.pdf

若為 zip 檔案，請下載檔案解壓縮後，用瀏覽器開啟資料夾中的 index.html 瀏覽全文。