标题: | BAYESIAN NONPARAMETRIC LANGUAGE MODELS |
作者: | Chang, Ying-Lan Chien, Jen-Tzung 电机资讯学士班 Undergraduate Honors Program of Electrical Engineering and Computer Science |
关键字: | language model;backoff smoothing;topic model;Bayesian nonparametrics |
公开日期: | 2012 |
摘要: | Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cope with this model selection problem, we introduce the nonparametric priors for topics and backoff n-grams. The infinite language models are constructed through the hierarchical Dirichlet process compound Pitman-Yor (PY) process. We develop the topic-based hierarchical PY language model (THPY-LM) with power-law behavior. This model can be simplified to the hierarchical PY (HPY) LM by disregarding the topic information and also the modified Kneser-Ney (MKN) LM by further disregarding the Bayesian treatment. In the experiments, the proposed THPY-LM outperforms state-of-art methods using MKN-LM and HPY-LM. |
URI: | http://hdl.handle.net/11536/21520 |
ISBN: | 978-1-4673-2507-3 |
期刊: | 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING |
起始页: | 188 |
结束页: | 192 |
显示于类别: | Conferences Paper |