Full metadata record
DC FieldValueLanguage
dc.contributor.authorChien, Jen-Tzungen_US
dc.contributor.authorChueh, Chuang-Huaen_US
dc.date.accessioned2014-12-08T15:28:42Z-
dc.date.available2014-12-08T15:28:42Z-
dc.date.issued2012en_US
dc.identifier.isbn978-1-4673-1026-0en_US
dc.identifier.urihttp://hdl.handle.net/11536/20765-
dc.description.abstractTopic model can be established by using Dirichlet distributions as the prior model to characterize latent topics in natural language. However, topics in real-world stream data are non-stationary. Training a reliable topic model is a challenging study. Further, the usage of words in different paragraphs within a document is varied due to different composition styles. This study presents a hierarchical segmentation model by compensating the heterogeneous topics in stream level and the heterogeneous words in document level. The topic similarity between sentences is calculated to form a beta prior for stream-level segmentation. This segmentation prior is adopted to group topic-coherent sentences into a document. For each pseudo-document, we incorporate a Markov chain to detect stylistic segments within a document. The words in a segment are generated by identical composition style. This new model is inferred by a variational Bayesian EM procedure. Experimental results show benefits by using the proposed model in terms of perplexity and F measure.en_US
dc.language.isoen_USen_US
dc.subjectMachine Learningen_US
dc.subjectTopic Modelen_US
dc.subjectGraphical Modelen_US
dc.subjectHierarchical Segmentationen_US
dc.titleLATENT DIRICHLET LEARNING FOR HIERARCHICAL SEGMENTATIONen_US
dc.typeProceedings Paperen_US
dc.identifier.journal2012 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP)en_US
dc.contributor.department電機資訊學士班zh_TW
dc.contributor.departmentUndergraduate Honors Program of Electrical Engineering and Computer Scienceen_US
dc.identifier.wosnumberWOS:000311966000063-
Appears in Collections:Conferences Paper