標題: | Hierarchical Pitman-Yor-Dirichlet Language Model |
作者: | Chien, Jen-Tzung 電機資訊學士班 Undergraduate Honors Program of Electrical Engineering and Computer Science |
關鍵字: | Bayesian nonparametrics;language model;speech recognition;topic model;unsupervised learning |
公開日期: | 1-Aug-2015 |
摘要: | Probabilistic models are often viewed as insufficiently expressive because of strong limitation and assumption on the probabilistic distribution and the fixed model complexity. Bayesian nonparametric learning pursues an expressive probabilistic representation based on the nonparametric prior and posterior distributions with less assumption-laden approach to inference. This paper presents a hierarchical Pitman-Yor-Dirichlet (HPYD) process as the nonparametric priors to infer the predictive probabilities of the smoothed n-grams with the integrated topic information. A metaphor of hierarchical Chinese restaurant process is proposed to infer the HPYD language model (HPYD-LM) via Gibbs sampling. This process is equivalent to implement the hierarchical Dirichlet process-latent Dirichlet allocation (HDP-LDA) with the twisted hierarchical Pitman-Yor LM (HPY-LM) as base measures. Accordingly, we produce the power-law distributions and extract the semantic topics to reflect the properties of natural language in the estimated HPYD-LM. The superiority of HPYD-LM to HPY-LM and other language models is demonstrated by the experiments on model perplexity and speech recognition. |
URI: | http://dx.doi.org/10.1109/TASLP.2015.2428632 http://hdl.handle.net/11536/127849 |
ISSN: | 2329-9290 |
DOI: | 10.1109/TASLP.2015.2428632 |
期刊: | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING |
Volume: | 23 |
起始頁: | 1259 |
結束頁: | 1272 |
Appears in Collections: | Articles |