Title: | Hierarchical Theme and Topic Modeling |
Authors: | Chien, Jen-Tzung 電機學院 College of Electrical and Computer Engineering |
Keywords: | Bayesian nonparametrics (BNPs);document summarization;structural learning;topic model |
Issue Date: | Mar-2016 |
Abstract: | Considering the hierarchical data groupings in text corpus, e.g., words, sentences, and documents, we conduct the structural learning and infer the latent themes and topics for sentences and words from a collection of documents, respectively. The relation between themes and topics under different data groupings is explored through an unsupervised procedure without limiting the number of clusters. A tree stick-breaking process is presented to draw theme proportions for different sentences. We build a hierarchical theme and topic model, which flexibly represents the heterogeneous documents using Bayesian nonparametrics. Thematic sentences and topical words are extracted. In the experiments, the proposed method is evaluated to be effective to build semantic tree structure for sentences and the corresponding words. The superiority of using tree model for selection of expressive sentences for document summarization is illustrated. |
URI: | http://dx.doi.org/10.1109/TNNLS.2015.2414658 http://hdl.handle.net/11536/133517 |
ISSN: | 2162-237X |
DOI: | 10.1109/TNNLS.2015.2414658 |
Journal: | IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS |
Volume: | 27 |
Issue: | 3 |
Begin Page: | 565 |
End Page: | 578 |
Appears in Collections: | Articles |