Full metadata record
DC FieldValueLanguage
dc.contributor.authorHsu, CNen_US
dc.contributor.authorHuang, HJen_US
dc.contributor.authorWong, TTen_US
dc.date.accessioned2014-12-08T15:40:01Z-
dc.date.available2014-12-08T15:40:01Z-
dc.date.issued2003-12-01en_US
dc.identifier.issn0885-6125en_US
dc.identifier.urihttp://dx.doi.org/10.1023/A:1026367023636en_US
dc.identifier.urihttp://hdl.handle.net/11536/27341-
dc.description.abstractIn a naive Bayesian classifier, discrete variables as well as discretized continuous variables are assumed to have Dirichlet priors. This paper describes the implications and applications of this model selection choice. We start by reviewing key properties of Dirichlet distributions. Among these properties, the most important one is "perfect aggregation," which allows us to explain why discretization works for a naive Bayesian classifier. Since perfect aggregation holds for Dirichlets, we can explain that in general, discretization can outperform parameter estimation assuming a normal distribution. In addition, we can explain why a wide variety of well-known discretization methods, such as entropy-based, ten-bin, and bin-log l, can perform well with insignificant difference. We designed experiments to verify our explanation using synthesized and real data sets and showed that in addition to well-known methods, a wide variety of discretization methods all perform similarly. Our analysis leads to a lazy discretization method, which discretizes continuous variables according to test data. The Dirichlet assumption implies that lazy methods can perform as well as eager discretization methods. We empirically confirmed this implication and extended the lazy method to classify set-valued and multi-interval data with a naive Bayesian classifier.en_US
dc.language.isoen_USen_US
dc.subjectnaive Bayesian classifiersen_US
dc.subjectDirichlet distributionsen_US
dc.subjectperfect aggregationen_US
dc.subjectcontinuous variablesen_US
dc.subjectdiscretizationen_US
dc.subjectlazy discretizationen_US
dc.subjectinterval dataen_US
dc.titleImplications of the Dirichlet assumption for discretization of continuous variables in naive Bayesian classifiersen_US
dc.typeArticleen_US
dc.identifier.doi10.1023/A:1026367023636en_US
dc.identifier.journalMACHINE LEARNINGen_US
dc.citation.volume53en_US
dc.citation.issue3en_US
dc.citation.spage235en_US
dc.citation.epage263en_US
dc.contributor.department資訊工程學系zh_TW
dc.contributor.departmentDepartment of Computer Scienceen_US
dc.identifier.wosnumberWOS:000186206000002-
dc.citation.woscount18-
Appears in Collections:Articles


Files in This Item:

  1. 000186206000002.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.