标题: 生物文献中同指涉问题处理之研究
Coreference Resolution in Biomedical Literature
作者: 林裕祥
梁婷
资讯科学与工程研究所
关键字: 同指涉;指代现象;指代词;名词指代;缩写;coreference;anaphora;pronominal;sortal;abbreviation
公开日期: 2004
摘要: 同指涉消解需要处理指代现象消解和缩写链结串联。我们使用规则式处理缩写问题,这规则式处理法则包含七条规则和使用了名词片语辨识器(NP-chunker)来辨识缩写和缩写的原型。我们可以处理缩写问题达到97%正确率和88%的招回率。除了缩写问题,我们处理了在生物文献中常见的代名词指代和名词指代词问题。处理机制里加入了知识本体(UMLS)和从生物文献中探勘出来的SA/AO (subject-action/action-object)样板。在此同时,对于名词指代现象中未知词使用了从UMLS中收集的中心词(headword)和从PubMed中探勘的样板。我们用基因演算法所得出了最佳特征值给分机制,来决定指代词和和它先行词的关系。与其它方法在相同语料(MEDLINE摘要)做比较,所提的方法处理指代词指代现象可达到92% F-Scorec和名词指代现象可达到78% F-Score。
Coreference resolution involves anaphora and abbreviation linkage. To handle abbreviations, we use a rule-based resolution which concerns seven rules with the help of a NP-chunker to identify abbreviation and its long form. Our abbreviation resolution can achieve 97% in precision and 88% in recall. On the other hand, we address pronominal and sortal anaphora, which are common in biomedical texts. The resolution was achieved by employing the UMLS ontology and SA/AO (subject-action/action-object) patterns mined from biomedical corpus. On the other hand, sortal anaphora for unknown words was tackled by using the headword collected from UMLS and the patterns mined from PubMed. The final set of antecedents finding was decided with a salience grading mechanism, which was tuned by a genetic algorithm at its best-input feature selection stage. Compared to previous approaches on the same MEDLINE abstracts, the presented resolution was promising for its 92% F-Score in pronominal anaphora and 78% F-Score in sortal anaphora.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009123534
http://hdl.handle.net/11536/52891
显示于类别:Thesis


文件中的档案:

  1. 353401.pdf
  2. 353402.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.