標題: | Hierarchical Theme and Topic Modeling |
作者: | Chien, Jen-Tzung 電機學院 College of Electrical and Computer Engineering |
關鍵字: | Bayesian nonparametrics (BNPs);document summarization;structural learning;topic model |
公開日期: | 三月-2016 |
摘要: | Considering the hierarchical data groupings in text corpus, e.g., words, sentences, and documents, we conduct the structural learning and infer the latent themes and topics for sentences and words from a collection of documents, respectively. The relation between themes and topics under different data groupings is explored through an unsupervised procedure without limiting the number of clusters. A tree stick-breaking process is presented to draw theme proportions for different sentences. We build a hierarchical theme and topic model, which flexibly represents the heterogeneous documents using Bayesian nonparametrics. Thematic sentences and topical words are extracted. In the experiments, the proposed method is evaluated to be effective to build semantic tree structure for sentences and the corresponding words. The superiority of using tree model for selection of expressive sentences for document summarization is illustrated. |
URI: | http://dx.doi.org/10.1109/TNNLS.2015.2414658 http://hdl.handle.net/11536/133517 |
ISSN: | 2162-237X |
DOI: | 10.1109/TNNLS.2015.2414658 |
期刊: | IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS |
Volume: | 27 |
Issue: | 3 |
起始頁: | 565 |
結束頁: | 578 |
顯示於類別: | 期刊論文 |