Title: | Text summarization using a trainable summarizer and latent semantic analysis |
Authors: | Yeh, JY Ke, HR Yang, WP Meng, IH 資訊工程學系 圖書館 Department of Computer Science Library |
Keywords: | text summarization;corpus-based approach;latent semantic analysis;text relationship map |
Issue Date: | 1-Jan-2005 |
Abstract: | This paper proposes two approaches to address text summarization: modified corpus-based approach (MCBA) and LSA-based T.R.M. approach (LSA + T.R.M.). The first is a trainable summarizer, which takes into account several features, including position, positive keyword, negative keyword, centrality, and the resemblance to the title, to generate summaries. Two new ideas are exploited: (1) sentence positions are ranked to emphasize the significances of different sentence positions, and (2) the score function is trained by the genetic algorithm (GA) to obtain a suitable combination of feature weights. The second uses latent semantic analysis (LSA) to derive the semantic matrix of a document or a corpus and uses semantic sentence representation to construct a semantic text relationship map. We evaluate LSA + T.R.M. both with single documents and at the corpus level to investigate the competence of LSA in text summarization. The two novel approaches were measured at several compression rates on a data corpus composed of 100 political articles. When the compression rate was 30%, an average f-measure of 49% for MCBA, 52% for MCBA + GA, 44% and 40% for LSA + T.R.M. in single-document and corpus level were achieved respectively. (C) 2004 Elsevier Ltd. All rights reserved. |
URI: | http://dx.doi.org/10.1016/j.ipm.2004.04.003 http://hdl.handle.net/11536/24779 |
ISSN: | 0306-4573 |
DOI: | 10.1016/j.ipm.2004.04.003 |
Journal: | INFORMATION PROCESSING & MANAGEMENT |
Volume: | 41 |
Issue: | 1 |
Begin Page: | 75 |
End Page: | 95 |
Appears in Collections: | Conferences Paper |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.