完整後設資料紀錄
DC 欄位語言
dc.contributor.author吳孟哲en_US
dc.contributor.authorWu, Meng-Cheen_US
dc.contributor.author陳信宏en_US
dc.contributor.authorChen, Sin-Horngen_US
dc.date.accessioned2015-11-26T01:02:13Z-
dc.date.available2015-11-26T01:02:13Z-
dc.date.issued2015en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT070160258en_US
dc.identifier.urihttp://hdl.handle.net/11536/127267-
dc.description.abstract本篇論文探討自然語言處理(Natural Language Processing, NLP)中命名實體 (Named Entity)所包含的人物名稱與稱謂名稱在中文現代語言的關係和種類,我們使用同義詞詞林與廣義知網中所收錄的稱謂種類並加入中央研究院詞性標記,透過語料庫Chinese Gigaword取出人名與稱謂的句型加以分析,再分別對人名與稱謂的種類進一步探討,文句間稱謂可能會與相鄰的詞彙合併使用並改變其語意,稱之為「複合式稱謂」,我們利用Mutual Information與T值分析複合式稱謂的結合程度並利用語意進一步將複合詞分類,同時,以社會評價的角度將稱謂區分為褒、貶評價稱謂並分析該稱謂的複合詞類別。參考我國內政部對全國總人口的人名統計,將人名分成「姓氏」與「名字」兩類分析,透過計算每個姓氏的人口數,進一步了解前一百大姓占我國總人口達96.56%,另外,我國在取用名字時會因為性別的不同使用不同的字詞,我們利用有限狀態機描述人名與稱謂的類別與句型,並分別建置醫學領域與教育領域的稱謂種類,再從測試語料庫中標記出所有相關領域的人名與稱謂句型,Recall Rate分別為90.6%與82.5%,結果顯示確實標記出該領域中大部分人名與稱謂的種類組合與句型。zh_TW
dc.description.abstractThe thesis discusses addressing terms and personal names are important elements in named entity is one of application from natural language processing (NLP). We collect addressing terms from CILIN and E-Hownet and discuss the types of personal names and addressing terms with the Linguistic Data Consortium (LDC)’s Chinese Gigaword by part-of-speech (POS) from Chinese Knowledge Information Processing Group cademia Sinica Institute of Information Science. In addressing terms, they may have some compound words beside them. Therefore, we use Mutual Information and T-score to get the compound type of addressing terms. In personal names, from the Department of Household Registration, M.O.I of R.O.C, we find personal names divide into first name and last name. And there are 96.56% of the populations from top 100 of last names. Moreover, in first name, people always use different words with their grender. After we know the types of personal names and addressing terms, we use the Finite-State Machine (FSM) to built addressing terms of medical and educational domain and as possible as to get all of the types of personal names and addressing terms in our test corpus. Finally, the recall rates are 90.6% and 82.5% and , indeed, the FSM get many types personal names and addressing terms from the corpus.en_US
dc.language.isozh_TWen_US
dc.subject命名實體zh_TW
dc.subject中華現代人名zh_TW
dc.subject稱謂zh_TW
dc.subject有限狀態機zh_TW
dc.subjectName Entityen_US
dc.subjectModern Chinese Nameen_US
dc.subjectTitleen_US
dc.subjectFinite-State Machineen_US
dc.title中華現代人名與稱謂之結構分析zh_TW
dc.titleAn Analysis on the Structure of Modern Chinese Name and Titleen_US
dc.typeThesisen_US
dc.contributor.department電信工程研究所zh_TW
顯示於類別:畢業論文