標題: | 多文件摘要系統基於Mutual Reinforcement原理 Multi-Document Summarization System Based on Mutual Reinforcement Principle |
作者: | 楊瑞敏 Yang, Ruin-Min 李嘉晃 Lee, Chia-Hoang 多媒體工程研究所 |
關鍵字: | 多文件;摘要系統;Mutual Reinforcement原理;Multi-Document;Summarization System;Mutual Reinforcement Principle |
公開日期: | 2009 |
摘要: | 根據研究報告指出,網際網路的蓬勃發展造成每年產生的數位化文件與影像等資料之總數皆呈倍數成長。 為了有效率地了解這些電子文件的資訊,本論文發展自動摘要系統將這些大量的數位化文件去蕪存菁,在不流失其原本的資訊的條件下,讓使用者快速且有效地了解這些資訊的內容。
本論文所提出的自動摘要系統考慮了三個不同面向來對句子作評分以作為挑選摘要句子的依據:1. 字詞與句子之間的關係;2. 標題與句子之間的關係;3. 句子與句子之間的關係。在對句子評分之前,本系統利用Alignment演算法與Mutual Reinforcement原理移除資料集中資訊量較低的句子,以避免這些低資訊量的句子被選取成摘要句子。 而上述所提及的三個不同面向則是分別利用HITS演算法、餘弦相似度計算方法與PageRank演算法來實現。
本論文使用的資料集為DUC資料集,其為英文資料集且組成文件為新聞類文章。 根據ROUGE評估工具的評估結果顯示,本摘要系統所產生的系統摘要達到不錯的效能。 According to the research report, the rapid development of the Internet results in the amount of the digital document, video, or other data to grow in double rate per year. In order to find out the information of these electronic files efficiently, this thesis develops an automatic summarization system to sieve out the non-information data of digital documents. Therefore, users can find out the contents of information efficiently without losing the meaning of the original documents. The automatic summarization system proposed in this thesis considers three different aspects for the sentence scoring: first, the relationship between words and sentences; second, the relationship between the titles and sentences; finally, the relationship between sentences and sentences. Before the sentences scoring, this summarization system uses Alignment algorithm and Mutual Reinforcement Principle to remove the sentences that have fewer information on the original dataset to avoid these sentences with fewer information to be selected as a part of the summary. The HITS algorithm, the cosine similarity calculation methods and the PageRank algorithm are employed respectively to achieve the above three different aspects. The dataset used in this thesis is the DUC dataset, and the constituent documents of the DUC dataset are the English news articles. The evaluation results of the evaluation tools ROUGE show the performance of the summary generate by this summarization system is good. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT079757552 http://hdl.handle.net/11536/46089 |
顯示於類別: | 畢業論文 |