標題: 應用資料探勘於顧客問題自動分類之研究 -以自來水公司民眾意見信箱為例
Data Mining for Automatic Classification of Customer Problems – A Case Study of Public Opinion Mailbox of a Water Company
作者: 鄭哲明
劉敦仁
理學院科技與數位學習學程
關鍵字: 資料探勘;文件分類;決策樹;貝氏分類;類神經網路;Data Mining;Text classificationt;NaiveBayes;Decision Tree;Neural network
公開日期: 2015
摘要: 透過網路平台交流意見及反映問題已成為民眾與政府機關及公營事業單位的重要溝通橋梁,如何將民眾的問題及意見,快速分案處理,已成為提昇為民服務品質之重要關鍵。 本研究係以自來水公司的民眾意見信箱內容作為文件自動分類的研究資料,首先我們設計了一個文件與詞彙的關聯式資料庫,透過CKIP斷詞服務及資料前置處理程式,自動化的將研究資料集轉換為文件與詞彙關係的向量矩陣。在完成前置處理之後,總計產出4585個候選詞彙。藉由監督式資料離化處理及監督式屬性選擇的過程,我們成功的從中萃取出55個關鍵詞彙,作為自動分類的詞彙屬性,屬性維度的大幅縮減對於後續分類演算法模型的建置,無論在提升分類準確性及縮短分類執行時間皆有很高的成效。 決策樹C4.5、簡易貝式分類及倒傳遞類神經網路三種分類演算法應用在自水公司民眾意見信箱的分類測試結果,以簡易貝氏分類的效能最好,正確率可達84.76%,且其執行時間僅需0.04秒,再者,簡易貝氏分類具有容易實作的特性,很適合應用在未來計畫建置的自動化分類系統中。
In this study, public opinion mailbox content of a Water Supply Company is used as research data for automatic classification of customer problems. CKIP and data pre-processing are used to automatically convert customer opinion content to vector space model of keywords. After the data preprocessing step, 4585 candidate keywords are produced. Moreover, 55 keywords are selected as the property of classifying customer problems via supervised data discretization and attribute selection. The substantial reduction of keywords for building classification model improves classification accuracy and efficiency in obtaining classification results. Decision tree C4.5, Naive Bayes Classification and back-propagation neural network are the three classification algorithms used in testing and evaluating the results of classifying customer problems. Upon the three algorithms, Naive Bayes classification has the best effectiveness with accuracy rate of 84.76% and the execution time is just 0.04 second. Moreover, the Naive Bayes classification is easier for implementation. Thus, it is suitable to implement the Naive Bayes classification algorithm when building automation classification systems in future.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070152807
http://hdl.handle.net/11536/125626
顯示於類別:畢業論文