标题: 唐诗之诗风探勘
Style mining for Tang Poetry
作者: 王乃仁
Nai-Jun Wang
曾宪雄
Shian-Shyong Tseng
理学院科技与数位学习学程
关键字: 唐诗;风格探勘;概念阶层;分群;关联规则探勘;Tang Poetry;style mining;concept hierarchy;clustering;association rule mining
公开日期: 2005
摘要:   诗是中国文化伟大的文学创作,尤其在唐朝时最为盛行。诗为韵文之一种,讲究音韵谐和,所用文字是古汉语,与现在的白话文差异很大,且汉语文字多有同义与歧义的问题。对诗作文字进行风格探勘,字词的概念繁琐,由专家学者来分析处理已是不容易,虽然文学评论者将唐诗依作者区分为不同风格的诗作,但区分的条件或规则并不明显,要从诗句文字中找出诗风分类的规则更是困难。
  本论文运用资料探勘技术中关联规则探勘盛唐时期文人诗作,找出诗人创作诗时所偏好使用的名词组合规则。第一阶段是名词撷取:文人在诗中常描写景观及事物变化用词大多为名词,所以从诗作撷取出名词作为基本的分析资料。第二阶段是名词概念归纳(concept generalization):本论文提出唐诗名词概念阶层(concept hierarchy)将词义概念繁杂的名词概念归纳成概念精简的名词类别,建置成唐诗名词类别集。第三阶段是名词使用差异性分群(clustering):比较诗作名词类别使用的差异性(dissimilarity)来分群,找出文人风格诗作群集。第四阶段是名词使用关联规则探勘(association rule mining):利用关联规则探勘分析风格诗作群集使用名词类别的组合,依可信度(confidence)及支持度(support)找出诗人诗词创作名词使用的风格规则。从分析的结果发现,实验方法将王维诗作分为边塞诗群及山水诗群等,可与诗词专家的评论相互佐证。并将文学评论者所整理之山水派唐诗经名词撷取、名词概念归纳后,与实验结果比较,与山水诗群相近且名词使用的规则也相同。
  此实验方法可以客观找出诗作因名词使用不同的风格判别规则,供诗词研究者分析或归类诗作,或供学习者了解诗作中名词使用所表现的风格意涵。
The Chinese poetry is the greatest Chinese literature creation especially in Tang Dynasty. It is a kind of rhymed article which is concerned with the harmonious rhyme. It is a challenge to analyze Tang poetry because the ancient Chinese language used in the poem is different from the modern Chinese language. Besides, there are many synonyms and ambiguity in Chinese vocabulary. Therefore, the analysis of the poetry style is difficult for Tang Poetry.
In this thesis, we propose a Data Mining approach to discover the style of the poetry for the poet. Firstly, in the noun retrieval process, the nouns in the poem are treated as the features of the poetry style and retrieved for data analysis. Secondly, in the noun concept generalization, we transform the nouns by Tang noun concept hierarchy to the corresponding concept. We construct the Tang concept hierarchy table from the transformed results. Thirdly, in the style clustering of poetry, we cluster all the poetry by comparing the nouns dissimilarity of poetry. Finally, in the noun association rule mining, we apply the association rule mining to discover the noun association of poetry. During these processes of the style mining for Tang poetry, we obtained the noun sets of poetry and the noun association rules. These results are verified by the experiment, and we can distinguish the poetry style from these association rules.
In the future, the noun association rules of Tang poetry that are discovered objectively by the style mining can support the researcher for further research.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009373502
http://hdl.handle.net/11536/80218
显示于类别:Thesis


文件中的档案:

  1. 350202.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.