Full metadata record
DC FieldValueLanguage
dc.contributor.author黃鴻儒en_US
dc.contributor.authorHung-Ju Huangen_US
dc.contributor.author許鈞南en_US
dc.contributor.author李嘉晃en_US
dc.contributor.authorChun-Nan Hsuen_US
dc.contributor.authorChia-Hoang Leeen_US
dc.date.accessioned2014-12-12T02:27:49Z-
dc.date.available2014-12-12T02:27:49Z-
dc.date.issued2001en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#NT900394057en_US
dc.identifier.urihttp://hdl.handle.net/11536/68583-
dc.description.abstract簡易貝氏分類器是一種簡單而且有用的分類工具。它已經被廣泛的使用在離散型變數的系統上,但對於處理其它非離散型變數卻有其窒礙難行的地方,如連續型變數。本論文主要是討論如何應用簡易貝氏分類器於處理多種常見的非離散型及整合型資料上。針對連續型變數,本論文顯示在一般情況下將連續型變數切割離散化後的效果會比假設它是常態分佈來的好。本論文並解釋了為什麼前人所提出各種不同切割連續型變數的方法對於簡易貝氏分類器來說其效果都差不多。經由分析,我們提出了一個稱為懶惰切割的切割方法,這個方法是根據測試資料來對連續型變數做動態的切割。此法不僅可以對連續型變數做有效的動態切割,且可使簡易貝氏分類器處理集合型,區間型,及多重區間型資料的分類查詢問題。對於整合型資料,本論文探討了如何使用簡易貝氏分類器來分類同質集。我們定義同質集內的樣本是來自於同一種未知的類別,像這樣型態的資料常常可在多種應用上中遇到。我們深入探討如何運用我們知道同質集內的每個樣本是屬於同一種未知的類別的這個資訊來提高簡易貝氏分類器的分類正確性。我們提出一個方法,稱作同質簡易貝氏分類器,是由簡易貝氏分類器擴充並將整個同質集當成一個物件做為輸入的分類器。將此法與常用的投票方法及其它幾種方法相比較,此法明顯的優於它種方法,即使當同質集內的樣本數還很少時也有很好的效果。我們並將此法成功的運用在語者辨識上,但其應用範圍不僅僅侷限在語者辨識系統。zh_TW
dc.description.abstractNaive Bayes is a simple and useful classification tool. It is the most commonly used in situations which all the variables are discrete because naive Bayes is difficult to model complex probability densities over nondiscrete data such as continuous variables. This thesis describes how to use naive Bayes to classify several types of nondiscrete and aggregate data. We show that, in general discretization of continuous variables can outperform parameter estimation assuming a normal distribution. Based on our analysis, we can explain why a wide variety of well-known discretization methods can perform well with insignificant difference. Our analysis leads to a lazy discretization method, which dynamically discretizes continuous variables according to test data. This method can be extended to allow a naive Bayes to classify set-valued, interval and multi-interval query data. We also address the problem of how to classify a set of query vectors belonging to the same unknown class. Sets of data known to be sampled from the same class are often seen in many application domains. We refer to these sets as homologous sets. We show how to take advantage of homologous sets in classification to improve accuracy over by classifying each query vector individually. Our method, called homologous naive Bayes (HNB), uses a modified classification procedure that classifies multiple instances as a single unit. Compared with a voting method and several other variants of naive Bayes classification, HNB significantly outperforms these methods in a variety of test data sets, even when the number of query vectors in the homologous sets is small. We also report a successful application of HNB to speaker recognition.en_US
dc.language.isoen_USen_US
dc.subject簡易貝氏分類器zh_TW
dc.subject連續型變數zh_TW
dc.subject區間查詢zh_TW
dc.subject同質集zh_TW
dc.subject語者辨識zh_TW
dc.subjectNaive Bayesian classifieren_US
dc.subjectContinuous variableen_US
dc.subjectInterval queryen_US
dc.subjectHomologous seten_US
dc.subjectSpeaker recognitionen_US
dc.title針對非離散及整合型資料於簡易貝氏分類器上的分析及應用zh_TW
dc.titleA Study of Naive Bayesian Classifiers for Nondiscrete and Aggregate Dataen_US
dc.typeThesisen_US
dc.contributor.department資訊科學與工程研究所zh_TW
Appears in Collections:Thesis