完整後設資料紀錄
DC 欄位語言
dc.contributor.author鄭佩琪en_US
dc.contributor.authorPei-Chi Chengen_US
dc.contributor.author曾憲雄en_US
dc.contributor.authorShian-Shyong Tsengen_US
dc.date.accessioned2014-12-12T02:04:03Z-
dc.date.available2014-12-12T02:04:03Z-
dc.date.issued2003en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT009123516en_US
dc.identifier.urihttp://hdl.handle.net/11536/52691-
dc.description.abstract隨著電子式文件的發展與增多,自動化文件分類(automatic document classification)在為使用者發掘和管理資訊上越來越重要。許多典型的分類方法,例如:C4.5,SVM,naïve Bayesian等,已被應用於發展文件分類器(classifier)。然而,這些方法大部份是批次處理(batch-based)的探勘技術,無法處理分類器在類別隨時間變化而增加的適應問題(category adaptation problem)。另外,關於文件表示的問題(document representation problem),大部份的表示法是以詞語空間(term-space)表示文件,可能產生許多沒有代表性的維度,使得分類器的效率和有效性因而降低。 本論文提出一個領域空間權重機制(domain-space weighting scheme),將文件以領域空間(domain-space)的表示法表示,並以漸進式(incremental)的方法建立文件分類器,解決上述的類別適應問題和文件表示問題。此機制包含三個階段:訓練階段(Training Phase)、鑑別階段(Discrimination Phase)和微調階段(Tuning Phase)。在訓練階段,此機制針對各個類別萃取出足以代表該類別的特徵,並依其對該類別的重要性給予權重值,再將結果儲存於特徵領域關聯權重表(feature-domain association weighting table)中,該表是用於記錄特徵與所有相關領域的關聯程度的表格。接著進入鑑別階段,此機制調降在分類時鑑別力小的特徵的權重值,以減低其對分類的影響力。至此,根據特徵領域關聯權重表,分類器已建置完成。而微調階段是選擇性的,利用微調文件的資訊加強分類器的分類能力。在實驗時,我們使用標準的測試文件集Reuters-21578 based on the “ModApte” split version評估所建置的分類器。實驗結果顯示,在有足夠的訓練文件下,分類器更加有效;而藉由微調階段,分類器更為強化。zh_TW
dc.description.abstractAs evolving and available of digital documents, automatic document classification (a.k.a. document categorization) has become more and more important for managing and discovering useful information for users. Many typical classification approaches, such as C4.5, SVM, Naïve Bayesian and so on, have been applied to develop a classifier. However, most of them are batch-based mining approaches, which cannot resolve the category adaptation problem; and referring to the document representation problem, the representations are usually in term-space, which may result in lots of less representative dimensions such that the efficiency and effectiveness are decreased. In this thesis, we propose a domain-space weighting scheme to represent documents in domain-space and incrementally construct a classifier to resolve both document representation and category adaptation problems. The proposed scheme consists of three major phases: Training Phase, Discrimination Phase and Tuning Phase. In the Training Phase, the scheme first incrementally extracts and weights features from each individual category, and then integrates the results into the feature-domain association weighting table which is used to maintain the association weight between each feature and all involved categories. Then in the Discrimination Phase, it diminishes feature weights with lower discriminating powers. A classifier can be therefore constructed according to the feature-domain association weighting table. Finally, the Tuning Phase is optional to strengthen the classifier by the feedback information of tuning documents. Experiments over the standard Reuters-21578 benchmark based on the “ModApte” split version are carried out and the experimental results show that with enough training documents the classifier constructed by our proposed scheme is rather effective and it is getting stronger by the Tuning Phase.en_US
dc.language.isoen_USen_US
dc.subject文件分類zh_TW
dc.subject文件表示zh_TW
dc.subject維度縮減zh_TW
dc.subject文詞權重zh_TW
dc.subjectdocument classificationen_US
dc.subjectdocument representationen_US
dc.subjectdimension redoctionen_US
dc.subjectterm weightingen_US
dc.title應用在文件分類的領域空間權重機制zh_TW
dc.titleDomain-space Weighting Scheme for Document Classificationen_US
dc.typeThesisen_US
dc.contributor.department資訊科學與工程研究所zh_TW
顯示於類別:畢業論文


文件中的檔案:

  1. 351601.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。