標題: 混合分配的貝氏分析:推論混合分配中組成成份的個數及處理標籤互換現象的重排序方法
Bayesian Inferences of Finite Mixture Models: Approaches for an Unknown Number of Components and Label Switching
作者: 潘家群
Pan, Jia-Chiun
黃冠華
Huang, Guan-Hua
統計學研究所
關鍵字: 標籤互換現象;貝氏分析;混合分配;可逆躍式馬可夫鏈蒙第卡羅法;馬可夫鏈蒙第卡羅法;潛在類別分析;Label switching;Mixture model;Markov chain Monte Carlo;Latent Class Analysis;Bayesian analysis;Reversible jump;transdimensional;identifiability
公開日期: 2011
摘要: 本論文依內容的性質可分為兩部份。在第一部份我們應用可逆躍式馬可夫鏈蒙第卡羅法於潛在類別變數模型中對於未知的潛在類別個數進行貝氏分析。有別於以往的潛在類別分析研究,模型中的潛在類別個數被視為一個可變動的隨機變數,此方法能同時取得模型參數及潛在類別個數的後驗分配,以同時進行貝氏推論。除此之外,本論文亦進行先驗分配的敏感度分析,透過此分析來瞭解如何設定先驗參數數值。最後以模擬資料驗證貝氏推論的準確度,並將此方法應用於活性與負性症狀量表 (PANSS)分析精神分裂症患者,以分類精神分裂症患者的潛在類別。 第二部份著重於貝氏分析中處理標籤互換現象的重排序方法。在混合分配的貝氏 分析中,由於各個組成成份的參數次序相互排列不會影響分配的概似函數數值,馬可夫鏈蒙第卡羅法所模擬出組成成份參數的後驗分配是多種排列下的後驗分配,此現象被稱為標籤互換現象。標籤互換現象造成無法在共同一種排列下對這些參數進行貝氏分析。本論文發現,在某些條件下,可將參數的後驗樣本重新排序至共同一種排列(或者可以說是正確標籤),正確地解決標籤互換現象所造成貝氏分析的困難。此外,在這些條件未滿足時,本論文提出三種排序法。第一種方法適用在貝氏分析時,觀察個體的邊際反應變數接近彼此獨立的情況(例如在樣本數較大時)。另兩種方法以建立統計模型的方式來處理不獨立的情況。最後以模擬資料比較這三種方法與其它現行方法的差異,並以論文第一部份精神分裂症患者的活性與負性症狀量表實例分析說明它是一個可以找出正確標籤的例子。
This dissertation consists of two parts. The first part focuses on analyzing data collected in situations where investigators use multiple discrete indicators as surrogates, for example, a set of questionnaires. A very flexible latent class model is used for analysis. We propose a Bayesian framework to perform the joint estimation of the number of latent classes and model parameters. The proposed approach applies the reversible jump Markov chain Monte Carlo to analyze finite mixtures of multivariate multinomial distributions. We have carried out a detailed sensitivity analysis for various hyperparameter specifications, which leads us to make standard default recommendations for the choice of priors. Usefulness of the proposed method is demonstrated through computer simulations and a study on subtypes of schizophrenia using the Positive and Negative Syndrome Scale (PANSS). The second part proposes relabelling methods to deal with the label switching problems in Bayesian finite mixture models. The label switching problem occurs as a result of the nonidentifiability of posterior distribution for various permutation of component labels. Here we show that, under some conditions, the correct labelling can be found, and we propose three relabelling algorithms to solve the label switching problem when the conditions are not met. The first algorithm is developed under the assumption where the allocation variable of individuals are nearly independent in posterior samples, which can be approximated under the large sample size of observations. The other algorithms are developed when the sample size is relatively small. They try to model the dependency among posterior distributions of the allocation variables, which can make the approaches use in the first algorithm valid. The success of the algorithms are demonstrated in two Monte Carlo simulation and real datasets.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079526801
http://hdl.handle.net/11536/41254
顯示於類別:畢業論文