文件自動化摘要方法之研究及其在中文文件的應用

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	葉鎮源	en_US
dc.contributor.author	Jen-Yuan Yeh	en_US
dc.contributor.author	柯皓仁	en_US
dc.contributor.author	楊維邦	en_US
dc.contributor.author	Hao-Ren Ke	en_US
dc.contributor.author	Wei-Pang Yang	en_US
dc.date.accessioned	2014-12-12T02:27:55Z	-
dc.date.available	2014-12-12T02:27:55Z	-
dc.date.issued	2001	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#NT900394087	en_US
dc.identifier.uri	http://hdl.handle.net/11536/68615	-
dc.description.abstract	本論文提出了兩種新的文件摘要方法來摘錄原始文件中的重要語句。第一個方法屬於以文件集為基礎的摘要技術(Corpus-based Approach)，此方法基於統計模型，利用特徵的分析來計算語句重要性。我們提出三個新的想法：1) 利用語句位置重要性的分級以提高不同語句位置的重要性；2)利用詞彙相關程度(Word Co-occurrence)計算找出文件中的新詞，並將新詞加入關鍵詞重要性的計算，以得到更精確的關鍵詞權重特徵值；3) 利用基因演算法訓練計算語句權重的Score Function，以期了解訓練文件集的特性。第二個方法，我們結合潛在語意分析(Latent Semantic Analysis)與主題相關地圖(Text Relationship Map)的概念，用來擷取文件中的概念結構(Conceptual Structure)以期得到語意層面的分析。實驗中，我們收集100篇新台灣週刊中關於政治類的文章，並將上述的兩種方法應用於中文文件的摘要實驗上。效益評估結果顯示，我們所提的方法都有不錯的表現，在壓縮比為30%的情況下，平均來說，召回率分別為52.0%及45.6%。	zh_TW
dc.description.abstract	In this thesis, two novel approaches are proposed to extract important sentences from a document to create its summary. The first is a corpus-based approach using feature analysis. It brings up three new ideas: 1) to employ ranked position to emphasize the significance of sentence position, 2) to reshape word unit to achieve higher accuracy of keyword importance, and 3) to train a score function by the genetic algorithm for obtaining a suitable combination of feature weights. The second approach combines the ideas of latent semantic analysis and text relationship maps to interpret conceptual structures of a document. Both approaches are applied to Chinese text summarization. The two approaches were evaluated by using a data corpus composed of 100 articles about politics from New Taiwan Weekly, and when the compression ratio was 30%, average recalls of 52.0% and 45.6% were achieved respectively.	en_US
dc.language.iso	zh_TW	en_US
dc.subject	中文文件摘要	zh_TW
dc.subject	以文件集為基礎的摘要技術	zh_TW
dc.subject	潛在語意分析	zh_TW
dc.subject	主題關係地圖	zh_TW
dc.subject	Chinese Text Summarization	en_US
dc.subject	Corpus-based Approach	en_US
dc.subject	Latent Semantic Analysis	en_US
dc.subject	Text Relationship Map	en_US
dc.title	文件自動化摘要方法之研究及其在中文文件的應用	zh_TW
dc.title	A Study on Automated Text Summarization and Its Application on Chinese Documents	en_US
dc.type	Thesis	en_US
dc.contributor.department	資訊科學與工程研究所	zh_TW
顯示於類別：	畢業論文

文件中的檔案：

039408701.pdf

若為 zip 檔案，請下載檔案解壓縮後，用瀏覽器開啟資料夾中的 index.html 瀏覽全文。