應用在文件分類的領域空間權重機制

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	鄭佩琪	en_US
dc.contributor.author	Pei-Chi Cheng	en_US
dc.contributor.author	曾憲雄	en_US
dc.contributor.author	Shian-Shyong Tseng	en_US
dc.date.accessioned	2014-12-12T02:04:03Z	-
dc.date.available	2014-12-12T02:04:03Z	-
dc.date.issued	2003	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#GT009123516	en_US
dc.identifier.uri	http://hdl.handle.net/11536/52691	-
dc.description.abstract	隨著電子式文件的發展與增多，自動化文件分類(automatic document classification)在為使用者發掘和管理資訊上越來越重要。許多典型的分類方法，例如：C4.5，SVM，naïve Bayesian等，已被應用於發展文件分類器(classifier)。然而，這些方法大部份是批次處理(batch-based)的探勘技術，無法處理分類器在類別隨時間變化而增加的適應問題(category adaptation problem)。另外，關於文件表示的問題(document representation problem)，大部份的表示法是以詞語空間(term-space)表示文件，可能產生許多沒有代表性的維度，使得分類器的效率和有效性因而降低。本論文提出一個領域空間權重機制(domain-space weighting scheme)，將文件以領域空間(domain-space)的表示法表示，並以漸進式(incremental)的方法建立文件分類器，解決上述的類別適應問題和文件表示問題。此機制包含三個階段：訓練階段(Training Phase)、鑑別階段(Discrimination Phase)和微調階段(Tuning Phase)。在訓練階段，此機制針對各個類別萃取出足以代表該類別的特徵，並依其對該類別的重要性給予權重值，再將結果儲存於特徵領域關聯權重表(feature-domain association weighting table)中，該表是用於記錄特徵與所有相關領域的關聯程度的表格。接著進入鑑別階段，此機制調降在分類時鑑別力小的特徵的權重值，以減低其對分類的影響力。至此，根據特徵領域關聯權重表，分類器已建置完成。而微調階段是選擇性的，利用微調文件的資訊加強分類器的分類能力。在實驗時，我們使用標準的測試文件集Reuters-21578 based on the “ModApte” split version評估所建置的分類器。實驗結果顯示，在有足夠的訓練文件下，分類器更加有效；而藉由微調階段，分類器更為強化。	zh_TW
dc.description.abstract	As evolving and available of digital documents, automatic document classification (a.k.a. document categorization) has become more and more important for managing and discovering useful information for users. Many typical classification approaches, such as C4.5, SVM, Naïve Bayesian and so on, have been applied to develop a classifier. However, most of them are batch-based mining approaches, which cannot resolve the category adaptation problem; and referring to the document representation problem, the representations are usually in term-space, which may result in lots of less representative dimensions such that the efficiency and effectiveness are decreased. In this thesis, we propose a domain-space weighting scheme to represent documents in domain-space and incrementally construct a classifier to resolve both document representation and category adaptation problems. The proposed scheme consists of three major phases: Training Phase, Discrimination Phase and Tuning Phase. In the Training Phase, the scheme first incrementally extracts and weights features from each individual category, and then integrates the results into the feature-domain association weighting table which is used to maintain the association weight between each feature and all involved categories. Then in the Discrimination Phase, it diminishes feature weights with lower discriminating powers. A classifier can be therefore constructed according to the feature-domain association weighting table. Finally, the Tuning Phase is optional to strengthen the classifier by the feedback information of tuning documents. Experiments over the standard Reuters-21578 benchmark based on the “ModApte” split version are carried out and the experimental results show that with enough training documents the classifier constructed by our proposed scheme is rather effective and it is getting stronger by the Tuning Phase.	en_US
dc.language.iso	en_US	en_US
dc.subject	文件分類	zh_TW
dc.subject	文件表示	zh_TW
dc.subject	維度縮減	zh_TW
dc.subject	文詞權重	zh_TW
dc.subject	document classification	en_US
dc.subject	document representation	en_US
dc.subject	dimension redoction	en_US
dc.subject	term weighting	en_US
dc.title	應用在文件分類的領域空間權重機制	zh_TW
dc.title	Domain-space Weighting Scheme for Document Classification	en_US
dc.type	Thesis	en_US
dc.contributor.department	資訊科學與工程研究所	zh_TW
顯示於類別：	畢業論文

文件中的檔案：

351601.pdf

若為 zip 檔案，請下載檔案解壓縮後，用瀏覽器開啟資料夾中的 index.html 瀏覽全文。