標題: 宋詞斷詞與本體論之建置
Building a Semantic Ontology with Song Ci Segmentation
作者: 許薰尹
Shin-Yean Hsu
曾憲雄
Shain-Shyong Tseng
理學院科技與數位學習學程
關鍵字: 斷詞;中文斷詞;本體論;宋詞;Segmentation;ontology;SONG-DYNASTY
公開日期: 2005
摘要: 宋詞又稱長短句,繼唐詩之後成為流傳千古的中國藝術結晶。由於宋詞採韻文書寫,對於現代人而言,不易學習。若能建構一個提供宋詞詞彙相關知識的本體,描述詞彙的語意,以及詞彙之間的關係,便可幫助現代人了解詞彙的含意。而欲建置本體的首要工作便是對詞句進行斷詞,以從中獲取所需的相關知識。 在本論文中,我們提出一個針對宋詞需根據詞牌倚聲填詞,按節奏停頓,以及宋詞特有的領字等特色進行斷詞,並透過詞彙語意的描述,來建置宋詞詞彙本體以輔助學習。論文包含兩大部份:宋詞斷詞器與本體論建置: 宋詞斷詞器利用規則式(Rule-Based)斷詞方式,截取詞句中的詞彙。包含六大斷詞模組:專有名詞、領字、典故、構詞模組、節奏斷詞模組、對仗模組。從斷詞實驗結果得知,召回率、精確度和效度最高可達90%。 本體論建置則是將斷詞後所得到的詞彙,進行語意概念的分類,以及詞彙的前後連接詞彙、詞類、詞頻、同義詞、近義詞、反義詞、對仗詞與平仄等語意的描述。我們設計了語意編輯工具編輯詞彙的相關資訊,並且自動產生表達本體知識的OWL文件,大量降低本體建置的負擔。最後,我們設計「絕妙好詞」網站,讓使用者可以很容易地透過網際網路,檢索詞彙語意資訊,進行線上學習。
The Song Ci, known as Long Short Sentence, is the art of the ancient Chinese after Tang Poetry. Since Song Ci was written by verse (韻文), it’s hard for modern people to learn. If we could construct an ontology to describe the semantic of words in Song Ci and the relationships among them, learning and the understanding of Song Ci will became easier. Before building the ontology, we will segment words contained in the sentence of Song Ci, and acquire all related information for this purpose. In this thesis, we propose a method according to Ci Pai (詞牌), rhythm of poetry, and the Empty word (領字) of Song Ci to segment words. After that, we construct an Song Ci ontology based on the semantic of words. This thesis contains two parts: Song Ci Parser and Ontology Building Module. Song Ci Parser, a rule-based parser, includes six modules for Song Ci segmentation: Proper Noun Module(專有名詞模組), Empty Word Module(領字模組), Literary Quotation Module (典故模組) , Word building Module (構詞模組) , Rhythm Module (節奏斷詞模組), and Pair Module. The experimental results show that the finest recall, precision, and effectiveness rate are 90%. Ontology Building Module will use the words preprocessed by Song Ci Parser to build an concept hierarchy of words in Song Ci. Finally we design a Semantic Editor to describe the semantics of word, E.g. Ci Pai (詞牌) , author name, frequency of words, word type, previous word, next word, antonym, near synonym, Synonym, etc. Finally, we build the “絕妙好詞” web site for people to learn the semantic of words from internet.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009373517
http://hdl.handle.net/11536/80232
顯示於類別:畢業論文


文件中的檔案:

  1. 351701.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。