標題: 問卷單複選題的遺失值補值法比較
The comparison of two missing values imputation methods in single and multiple choice question
作者: 孫承彬
Sun, cheng-bin
王秀瑛
Wang, Hsiu-Ying
統計學研究所
關鍵字: 遺失值;迴歸分析;補值法;多選題;最近鄰居法;Missing values;KNN;Linear regression;Imputation;Multiple-choice question
公開日期: 2015
摘要: 問卷是一種常用的收集資料方法,一般來說,問卷都會有單選題與複選題。在使用問卷收集完資料時常常會遇到遺失值的問題,因此可以使用統計方法來估計遺失值。本文的重點在於比較兩種常見的遺失值補值方法在單複選題上的遺失值補值表現,方法分別為K-nearest neighbors algorithm與Linear regression 方法。我們模擬線性關係的資料來比較這兩個方法在不同情況下的估計準確度,不同的情況如遺失值的數量,題目數量與題目的選項多寡等。除了統計模擬以外,我們還會使用真實的資料來比較此兩種方法以及跟模擬結果的差異。
Questionnaire is a common way to collect data. In generally, a questionnaire usually consists of single-response questions and multiple-response questions. After we collect questionnaire data, it is possible that there exist missing values. To increase the accuracy of the survey result, we can use statistic methods to impute missing values. This paper mainly discusses two widely-used imputing methods for imputing missing values in analyzing data of single-response question and multiple-response question, respectively. These two methods are the K-nearest neighbors algorithm and the linear regression approach. We compare the accurate rates of these two methods in different conditions, such as different missing rate, number of questions and number of choices. In addition, we also use a real data example to compare these two methods and compare the results between real data and simulation.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070252621
http://hdl.handle.net/11536/126073
顯示於類別:畢業論文