標題: | 意見探勘在中文評鑑語料之應用 Applying opinion mining to Chinese review corpus |
作者: | 謝鎮宇 Hsieh, Chen-Yu 梁婷 Liang, Tyne 資訊學院資訊學程 |
關鍵字: | 意見探勘;中文評鑑語料;Opinion mining;Chinese review corpus |
公開日期: | 2010 |
摘要: | 現今人們可能會透過各種不同的平台如:Facebook, Twitter, Plurk等表達他們無論是在社會問題或者是商業產品上的意見。在本篇論文當中,我們提出一個使用意見探勘技術應用在中文評鑑語料上的飯店評價系統原型。系統包含了語料處理、特徵詞擷取、意見詞及語意傾向辨識和飯店評價。透過人工來獲得目標特徵詞,再利用中研院中英雙語知識本體詞網(BOW)作擴充。意見詞則使用台大意見詞詞典(NTUSD)來收集,再利用飯店評論語料來擴充。同時,也提出了兩個不同的評分方法來判斷意見詞的語意傾向:一個是透過台大意見詞詞典(NTUSD),而另一個是使用光華雜誌(SINO)語料。人工標記顯示可探勘出額外的122個正向詞與83個負向詞來形容目標特徵。最後,提出一個藉由考慮飯店的五大特徵類別的飯店評分功能來針對28間飯店作評價。實驗結果顯示當採用充足的評論語料時,可得到F-Score達79%。 Nowadays people are likely to express their opinions either on social issues or commercial products through various platforms such as Facebook, Twitter, Plurk, etc. In the thesis, a hotel evaluation prototype is presented by using opinion mining techniques to Chinese review reports. The system contains report processing, feature word acquisition, opinion word and polarity identification, and hotel evaluation. The target feature words are manually acquired and expanded with the help of a bilingual WordNet. The opinion words are collected through the sentiment dictionary NTUSD and expanded by employing hotel review corpus. Meanwhile, two salience score functions are presented to consider the polarity of opinion words. One is based on NTUSD and the other is based on SINO corpus. Manual justification shows that additional 122 positive and 83 negative words can be mined for the addressed opinion targets. Finally, a hotel scoring function is presented to evaluate 28 hotels by considering the addressed five types of hotel features. The results show that about 79% F-score can be obtained when sufficient review corpus is employed. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT079679538 http://hdl.handle.net/11536/44082 |
Appears in Collections: | Thesis |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.