標題: 潛在類別模型的變數選擇與在交通流量預測的應用
Variable selection in the latent class model and its application in traffic flow prediction
作者: 陳珮文
黃冠華
Chen, Pei-Wen
Huang, Guan-Hua
統計學研究所
關鍵字: 迴歸潛在類別模型;高維度資料;變數選擇;交替K均值分群法;交通流量預測;regression extend of latent class analysis;high-dimensional data;variable selection;alternate k-means clustering;traffic flow prediction
公開日期: 2017
摘要: 在做迴歸潛在類別模型的參數估計時,我們可以透過將觀察單位分群(clustering)的方式,來獲得潛在類別變數的估計。在高維度的資料下,做分群分析時的變數選取,則顯得非常重要。在此,我們提出交替K均值分群演算法,此演算法可找出雜訊變數,將之排除在分群分析之外,以期獲得最佳的潛在類別分群結果。接著,再視此分群結果為已知,我們便可以估計其他模型之參數。我們透過運用此具變數選擇能力的迴歸潛在類別模型,發展出一個「台灣國道五號交通流量」的預測方法,來預測未來三個月後或更遠某連續假日之全天交通流量。我們並比較原始的K均值分群演算法及新提出的交替K均值演算法,檢視兩者在交通流量預測結果之差異。
Parameters in the regression extend of latent class analysis (RLCA) model can be estimated by some clustering methods. For the high-dimensional data, variable selection in cluster analysis becomes an important issue. Here, we adopt an alternate k-means clustering method to first distinguish clustering and noisy variables (surrogates) and then exclude those noisy variables from clustering. By doing so, we can increase the accuracy of clustering results and thus have a better estimate of the latent class variable. By treating the estimated latent class as known, one can then estimate other parameters in the RLCA model. We offer a prediction method under this newly developed variable-selecting RLCA model to predict the traffic flow on the Freeway No. 5 in Taiwan. The aim of prediction is to predict the whole day traffic flow of some holidays that are three months or more away from now. We also compare the results from the traditional k-means algorithm with those from the proposed alternate k-means algorithm.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070452608
http://hdl.handle.net/11536/141453
顯示於類別:畢業論文