Full metadata record
DC FieldValueLanguage
dc.contributor.authorChang, Ying-Lanen_US
dc.contributor.authorChien, Jen-Tzungen_US
dc.date.accessioned2014-12-08T15:30:03Z-
dc.date.available2014-12-08T15:30:03Z-
dc.date.issued2012en_US
dc.identifier.isbn978-1-4673-2507-3en_US
dc.identifier.urihttp://hdl.handle.net/11536/21520-
dc.description.abstractBackoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cope with this model selection problem, we introduce the nonparametric priors for topics and backoff n-grams. The infinite language models are constructed through the hierarchical Dirichlet process compound Pitman-Yor (PY) process. We develop the topic-based hierarchical PY language model (THPY-LM) with power-law behavior. This model can be simplified to the hierarchical PY (HPY) LM by disregarding the topic information and also the modified Kneser-Ney (MKN) LM by further disregarding the Bayesian treatment. In the experiments, the proposed THPY-LM outperforms state-of-art methods using MKN-LM and HPY-LM.en_US
dc.language.isoen_USen_US
dc.subjectlanguage modelen_US
dc.subjectbackoff smoothingen_US
dc.subjecttopic modelen_US
dc.subjectBayesian nonparametricsen_US
dc.titleBAYESIAN NONPARAMETRIC LANGUAGE MODELSen_US
dc.typeProceedings Paperen_US
dc.identifier.journal2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSINGen_US
dc.citation.spage188en_US
dc.citation.epage192en_US
dc.contributor.department電機資訊學士班zh_TW
dc.contributor.departmentUndergraduate Honors Program of Electrical Engineering and Computer Scienceen_US
dc.identifier.wosnumberWOS:000316984700047-
Appears in Collections:Conferences Paper