Full metadata record
DC FieldValueLanguage
dc.contributor.author聶家祺en_US
dc.contributor.authorNieh, Chia-Chien_US
dc.contributor.author梁婷en_US
dc.contributor.authorLiang, Tyneen_US
dc.date.accessioned2014-12-12T01:43:49Z-
dc.date.available2014-12-12T01:43:49Z-
dc.date.issued2011en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT079755619en_US
dc.identifier.urihttp://hdl.handle.net/11536/45965-
dc.description.abstract實體間的關係辨識一直是篇章處理中的重要工作。目前所辨識的關係,有人物與組織間的工作關係、疾病和藥的關係、作者與作品的關係、蛋白質間的交互關係或是名詞間的等價關係。所使用的方法多以學習模組或樣本分析進行辨識;少部分則是利用剖析樹從句法結構中來辨識目標關係。基本上這些方法所使用的語料可分為固定的語料及動態更新的語料(如網路搜尋結果)。雖然從固定語料辨識關係可獲得較高的正確率,然而透過搜尋引擎的搜尋結果可以得到較新的資訊。在本篇論文中,我們考量人際關係常有更新,因此在搜尋引擎結果中辨識人際關係。因此我們利用Wikipedia建置開發語料,整理出親屬關係及工作關係的關係樣板。此外,為辨識每個人物實體所對應的領域及領域詞彙,我們利用bootstrapping方式從開發語料中抽取出線索詞,用以擴充查詢詞,以擷取出相關的搜尋結果。為了加速篇章處理,我們採用簡單的人名及詞性標記,並進行人稱代詞的消解。我們提出兩階段的辨識程序,第一階段透過比對樣板,第二階段從支援向量機(support vector machine, SVM)透過抽取7種特徵進行辨識。特徵包括線索詞的數量與位置、人物的mutual information、及實體間的相似度。最後所提的方法在396個親屬關係案例的實驗的F-score可達到0.86;在175個工作關係案例中的F-score則有0.75。zh_TW
dc.description.abstractIdentifying relation among entities is an important task in document processing. The relations identified in previous researches include co-working relations between persons and organizations, relations among diseases and medicines, relations between authors and artifacts, the interactions between proteins, and the equivalence relations among nominals etc... Most identification methods are based on machine learning algorithms or pattern matching and few are based on parsing result. Besides, the corpora used for relation identification can be static and dynamic (like search engine results). Although identifying relations from static corpus generally outperforms the methods using dynamic corpora, yet dynamic corpora contain more updated information. In this thesis, we employ retrieved snippets to identify human relationships and Wikipedia to construct developing corpus. We extract domain words from developing corpus by the bootstrapping algorithm and expand queries for accurate search results. To speed up document processing, simple methods are implemented for part-of-speech tagging, person name tagging and pronominal anaphor resolution. The proposed kinship identification is implemented by pattern matching and support vector machine (SVM). The Features to be used at identification includes the amount and position of clue words and cosine similarity of entities related to persons. The kinship identifier yields 0.86 f-score in the experiment containing 396 kinship instances and the co-working identifier yields 0.75 f-score on 175 co-working instances.en_US
dc.language.isoen_USen_US
dc.subject關係辨識zh_TW
dc.subject自然語言zh_TW
dc.subject搜尋引擎zh_TW
dc.subject領域詞彙zh_TW
dc.subjectrelation identificationen_US
dc.subjectnatural language processingen_US
dc.subjectsearch engineen_US
dc.subjectdomain worden_US
dc.title從搜尋結果進行人際關係辨識zh_TW
dc.titleIdentify Human Rellationship From Retrieved Snippetsen_US
dc.typeThesisen_US
dc.contributor.department資訊科學與工程研究所zh_TW
Appears in Collections:Thesis


Files in This Item:

  1. 561901.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.