Title: Latent Dirichlet mixture model
Authors: Chien, Jen-Tzung
Lee, Chao-Hsi
Tan, Zheng-Hua
電機工程學系
Department of Electrical and Computer Engineering
Keywords: Bayesian learning;Topic model;Dirichlet mixture model
Issue Date: 22-Feb-2018
Abstract: Text representation based on latent topic model is seen as a non-Gaussian problem where the observed words and latent topics are multinomial variables and the topic proportionals are Dirichlet variables. Traditional topic model is established by introducing a single Dirichlet prior to characterize the topic proportionals. The words in a text document are represented by a random mixture of semantic topics. However, in real world, a single Dirichlet distribution may not faithfully reflect the variations of topic proportionals estimated from the heterogeneous documents. To address these variations, we propose a new latent variable model where latent topics and their proportionals are learned by incorporating the prior based on Dirichlet mixture model. The resulting latent Dirichlet mixture model (LDMM) is constructed for topic clustering as well as document clustering. Multiple Dirichlets provide a solution to build structural latent variables in learning representation over a variety of topics. This study carries out the inference for LDMM according to the variational Bayes and the collapsed variational Bayes. Such an unsupervised LDMM is further extended to a supervised LDMM for text classification. Experiments on document representation, summarization and classification show the merit of structural prior in LDMM topic models. (C) 2017 Elsevier B. V. All rights reserved.
URI: http://dx.doi.org/10.1016/j.neucom.2017.08.029
http://hdl.handle.net/11536/144459
ISSN: 0925-2312
DOI: 10.1016/j.neucom.2017.08.029
Journal: NEUROCOMPUTING
Volume: 278
Begin Page: 12
End Page: 22
Appears in Collections:Articles