生物醫學領域專有名詞萃取

Full metadata record

DC Field	Value	Language
dc.contributor.author	陳建行	en_US
dc.contributor.author	Jian-Hsin Chen	en_US
dc.contributor.author	梁婷	en_US
dc.contributor.author	Tyne Liang	en_US
dc.date.accessioned	2014-12-12T02:30:30Z	-
dc.date.available	2014-12-12T02:30:30Z	-
dc.date.issued	2002	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#NT910394105	en_US
dc.identifier.uri	http://hdl.handle.net/11536/70272	-
dc.description.abstract	本論文提出一個綜合式的生物醫學領域文件的自動化專有名詞辨識與分類系統。希望能藉此系統，以提供生物醫學物件關係之資訊萃取系統的前置處理。本系統的核心為馬可夫模型。我們由生物醫學文獻中萃取出內部特徵、外部特徵、及全域特徵，將這些特徵當成代表文字的特徵值。透過這些特徵值與馬可夫模型，我們可從未經處理過的文件中辨識出專有名詞。本論文提出了三種馬可夫模型分類器以供評估與比較。除了統計式的方法之外，我們亦使用歸納的經驗法則發掘出包含在含隱藏詞連接詞子句的專有名詞。實驗結果證明了我們所提出的方法之可行性。針對包含隱藏詞連接詞子句的分解，於165個含該句型的測試句中，可達到92%的求全率與46%的求準率。針對專有名詞的邊界標記，於1,685個測試句中，可達到72%的求全率與66%的求準率，針對蛋白質、去氧核糖核酸/核糖核酸、來源、與其他生物醫學專有名詞的分類，我們可達到63%的求全率與57%的求準率。	zh_TW
dc.description.abstract	In this thesis, we proposed a hybrid automatic named entity extraction system applied on biomedical domain. We hope the system can be used as the front-end of the Information Extraction system for biomedical object relation extraction. The kernel of the system is based on Hidden Markov Models (HMMs). We extract internal feature, external feature, and global feature from the biomedical literature as its representative characteristics. With these features and our HMMs extractor, we recognize named entities from raw text. Three kinds of HMMs classifiers are built for evaluation and comparison. Besides statistical approach, we use heuristic rules to mine hiding named entities and expand them out of coordinated clauses with ellipsis. Experiment results are shown to prove the feasibility of the proposed approach. On 165 testing sentences containing ellipsis patterns, we achieve 92% recall and 46% precision expanding the coordinated clause with ellipsis. On 1,685 testing sentences, the proposed named entity extraction system obtains 72% recall and 66% precision for identifying the boundary of named entities and obtains 63% recall and 57 precision for categorizing the classes of Protein, DNA/RNA, Source, and Other biomedical entities.	en_US
dc.language.iso	en_US	en_US
dc.subject	生物醫學文件	zh_TW
dc.subject	專有名詞辨識	zh_TW
dc.subject	專有名詞分類	zh_TW
dc.subject	馬可夫模型	zh_TW
dc.subject	省略詞	zh_TW
dc.subject	Biomedical Literature	en_US
dc.subject	Named Entity Identification	en_US
dc.subject	Named Entity Classification	en_US
dc.subject	Hidden Markov Models	en_US
dc.subject	Ellipsis	en_US
dc.title	生物醫學領域專有名詞萃取	zh_TW
dc.title	Named Entity Extraction in Biomedical Domain	en_US
dc.type	Thesis	en_US
dc.contributor.department	資訊科學與工程研究所	zh_TW
Appears in Collections:	Thesis