Title: 廣義線性模型使用懲罰概似函數之模型選取
Model Selection for Generalized Linear Models Using Penalized Likelihood
Authors: 宋佩芸
Sung, Pei-Yun
黃信誠
Huang, Hsin-Cheng
統計學研究所
Keywords: Akaike information criterion;Bayesian information criterion;coordinate descent;di?erence convex programming;Lasso;L0 penalty;oracle property;Akaike information criterion;Bayesian information criterion;coordinate descent;di?erence convex programming;Lasso;L0 penalty;oracle property
Issue Date: 2010
Abstract: 隨著越來越多高維度資料需要被了解,在一般線性模型迴歸和廣益線性模型迴歸問題下找出資料中的重要變數已成為越來越重要的問題。許多方法已在文獻中被提出,包括 Akaike’s information criterion (AIC)、Bayesian information criterion (BIC)、Lasso 等等,但AIC和BIC在變數量很大時都難以實行,因此通常是透過使用一些逐步的選取程序。在這篇論文中,我們延伸 Shen et al. (2010)的方法,將有懲罰項的最小平方法和L1懲罰的變形應用在廣義線性迴歸上,這樣的方法在變數量大時可以逼近AIC和BIC。我們介紹一個有效的演算方法,其利用Difference convex programming (DCP), iteratively reweighted penalized least squares以及coordinate descent演算法解決問題。在文章中,我們探討此方法具有之性質並提供數值模擬呈現我們方法的優勢。最後,將該方法使用於分析體重過輕嬰兒的數據,並指出影響出生嬰兒體重過輕之因素。
With higher and higher dimensional data being available, identifying important variables among many variables has become more and more important in regression and generalized linear regression. Many approaches have been proposed in the literature, including Akaike's information criterion (AIC), Bayesian information criterion (BIC), Lasso, etc. However, both AIC and BIC are difficult to implement when the number of variables is large, and hence are usually done using some stepwise procedures. In this thesis, we extend an approach of Shen et al. (2010), who considered a penalized least squares method with a truncated L1 penalty for linear regression, to generalized linear regression. This approach enables us to well approximate AIC and BIC even when the number of variables is large. A computational efficient algorithm is introduced, which utilizes difference convex programming, iteratively reweighted penalized least squares, and the coordinate descent algorithm. Some oracle property of the proposed method is established, and some numerical examples are provided to demonstrate the superiority of the proposed method over AIC and BIC. Finally, the proposed method is applied to analyze a low birth weight dataset, in which we identify important variables associated with a low birth weight baby.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079826506
http://hdl.handle.net/11536/47672
Appears in Collections:Thesis


Files in This Item:

  1. 650601.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.