標題: | 運用SVM分類技術提昇民意信箱服務品質之研究 To enhance the quality of research for public mailbox service by using the techniques of SVM classification |
作者: | 謝東良 Shie, Dung-Liang 蔡銘箴 Tsai, Min-Jen 管理學院資訊管理學程 |
關鍵字: | 民意信箱;支撐向量機;資料探勘;文件探勘;文件分類;機器學習;Public Mailbox;Support Vector Machine;Data mining;Text mining;Text Classification;Machine Learning |
公開日期: | 2009 |
摘要: | 政府機關設立民意信箱的宗旨為「提供市民便捷的市政建議管道、提昇為民服務品質」,透過網站每年可收到成千上萬封民眾的投書,內容大多為市政建言、陳情、查報或諮詢…等案件。這些案件過去是由民眾任選權責單位後逕行投書,準確率平均約56%(從94到98年),並不是很理想。若案件分類以人工方式一篇一篇的瀏覽,再判斷單位的方式也不符合效益,因為不是每個人都清楚案件的權責劃分,因此,這是本研究的動機去建立一套輔助市民投書時單位建議的分類系統,減少因投錯單位而衍生的處理成本,亦可維持案件分類的一致性。
本研究運用資訊科技(Information Technology, IT)建立一個文件分類的機制來提升服務品質,使用技術是基於文件探勘(Text Mining)和支撐向量機(Support Vector Machine, SVM)來達成,資料前處理是以中文斷詞系統(CKIP)去分辨文本中的「詞」,才進行進一步的處理,特徵詞的計算是用資訊檢索與文件探勘常用的加權技術TF-IDF去計算關鍵詞的權重。最後案件經過SVM分類器學習後之模型來預測未知案件的類別,分類準確率平均約77%(最高81%)。實驗結果資訊技術可將資料變成有用的知識或智慧,並提昇為民服務的品質。 To provide the public with convenient municipal proposals for better service quality, the government agencies set up the public web mailbox as the communication channel. However, most of the contents include the municipal, petition, complaint or consultation… etc, were not directly reported to the correct responsible departments because they had always been selected subjectively by the authors. The accuracy for hitting the correct responsible departments was on average about 56% since year 2005 to 2009, not satisfactory enough. It’s not efficient to determine responsible departments case by case manually because not all the assigners are familiar with all the responsibilities to handle the raising cases. Therefore, it is the motivation of this study to propose a mail classification system which can enhance the municipal proposal channel with consistency and also reduce the cost for wrong assignment. The object of this study is to propose a document categorization mechanism by using the Information Technology to enhance the service quality. The technique is developed based on the Text Mining technology and Support Vector Machine for implementation. For data pre-processing, the Chinese word segmentation system (CKIP) is applied to identify the “text” for further processing. For feature text calculations, is using the weighted technology: TF-IDF, which commonly used in information retrieval and text mining to determine the weight of the keyword. Finally, we can use the SVM classifier model to predict the category for unknown case; the accuracy is about 77% (maximum 81%). To sum up, Information Technology can turn the data into useful knowledge to enhance the service quality according to the experimental results. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT079764511 http://hdl.handle.net/11536/46242 |
Appears in Collections: | Thesis |