标题: Hierarchical Theme and Topic Modeling
作者: Chien, Jen-Tzung
电机学院
College of Electrical and Computer Engineering
关键字: Bayesian nonparametrics (BNPs);document summarization;structural learning;topic model
公开日期: 三月-2016
摘要: Considering the hierarchical data groupings in text corpus, e.g., words, sentences, and documents, we conduct the structural learning and infer the latent themes and topics for sentences and words from a collection of documents, respectively. The relation between themes and topics under different data groupings is explored through an unsupervised procedure without limiting the number of clusters. A tree stick-breaking process is presented to draw theme proportions for different sentences. We build a hierarchical theme and topic model, which flexibly represents the heterogeneous documents using Bayesian nonparametrics. Thematic sentences and topical words are extracted. In the experiments, the proposed method is evaluated to be effective to build semantic tree structure for sentences and the corresponding words. The superiority of using tree model for selection of expressive sentences for document summarization is illustrated.
URI: http://dx.doi.org/10.1109/TNNLS.2015.2414658
http://hdl.handle.net/11536/133517
ISSN: 2162-237X
DOI: 10.1109/TNNLS.2015.2414658
期刊: IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
Volume: 27
Issue: 3
起始页: 565
结束页: 578
显示于类别:Articles