Title: 近體詩主題辨識系統研製
Jintishi Processing and Categorization
Authors: 王笙權
Wang, Sheng-Chuan
梁婷
Tyne, Liang
資訊科學與工程研究所
Keywords: 主題分類;情感分類;詞彙語意消解;Topic Classification;Sentiment Classification;Word Sense Disambiguation
Issue Date: 2011
Abstract: 近體詩是中國文學的精粹之一,以精簡的文字表達豐富的情感與思想。此外詩作也可能包含大量的典故與對仗,因此近體詩對於一般人而言在理解與創作上存在著一定程度的困難。有鑑於此,本論文利用文本分類技術,以進行近體詩處理研究,並建立一個近體詩主題辨識系統。此系統提供詩作相關查詢及詩作處理功能包括斷詞、概念標記、情感辨識、及內容主題辨識等。本研究將主題辨識歸類成詠物述志、山水田園、情愛閨怨、贈別思友、邊塞征戰、社會民生等六項;情感辨識標註為喜愛、怨怒、哀愁等三項。在主題辨識的實驗中我們以992首七言律詩作為實驗語料,萃取詩作的八種詞彙與概念特徵,以支援向量機(SVM)模組進行辨識。經過tenth-fold cross-validation檢驗,主題辨識的平均正確率為69.12%。以同樣的模組,在情感辨識的實驗中我們以492首七言律詩作為實驗語料,得到70.7%的辨識正確率。
Jintishi is one of the Chinese literature classics. Jintishi reveals rich emotion and thoughts in few words. Jintishi may contain allusions and follows syntactic and semantic parallelisms making them difficult to be understood. Therefore, we used text classification techniques to analyze Jintishi and built up a Jintishi topic identification system. The system provides poem search and poem analysis including word segmentation, semantic tagging, topic identification and emotion identification. We classified Jintishi into six topic categories, namely, Chanting Object, Landscape, Desperate Wife, Farewell, Frontier and Social Poem. Additionally, our system supports emotion categorization, namely, happiness, sadness or anger. We used 992 seven-character Lushi in topic identification labeling experiment. We extracted eight lexical and concept Jintishi features and used support vector machine to identify topics for each poem. We get 69.12% accuracy after ten-fold validation. The emotion identification method was performed and tested too. Using 492 seven-character Lushi as test corpus, we get 70.7% accuracy.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079855513
http://hdl.handle.net/11536/48248
Appears in Collections:Thesis


Files in This Item:

  1. 551301.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.