標題: DBSP:一個有效的多維度密度函數估計基於分散式貝氏循序分割方法
DBSP: Distributed-computing Bayesian Sequential Partitioning for Multivariate Density Estimation
作者: 蔡沛原
Pei-Yuan Tsai
盧鴻興
Lu, Horng-Shing
統計學研究所
關鍵字: 貝氏循序分割;密度函數估計;分散式計算;分治法;Bayesian sequential partitioning;density estimation;distributed computing;divide and conquer
公開日期: 2014
摘要: 貝氏循序分割方法(BSP)是一個資料取向的無母數機率密度函數估計新方法。運用貝氏方法循序地在資料空間上造出有效的分割,進而在此分割上建立直方圖。相較於傳統的直方圖法,貝氏循序分割方法可以節省很多不必要的切割,並在高維度資料依然維持精準地估計。然而此演算法若以純軟體版本實現,針對一般的資料量即需要顯著的時間方能執行完畢。因此這篇論文探討了加速BSP演算法的可能性,提出一個策略可以降低計算時間,同時可以處理更大的資料量,並試著達到更好的機率密度函數估計。
Bayesian Sequential Partitioning (BSP; Lu, Jiang and Wong, 2013) is a data-driven and non-parametric density estimation method for multivariate data. It aims at bisecting the sub-regions of sample space sequentially so that a histogram can be constructed effectively based on partitioned sub-regions. It can efficiently reduce unnecessary cuts and perform accurate density estimation in high dimensional space. The purpose of this study is to extend the applicability of the BSP algorithm when the sample size is very large. We adopt a distributed computation strategy to develop the method of Distributed-computing Bayesian Sequential Partitioning (DBSP) that makes BSP feasible for large data volumes and reduces the overall computation time. The method of DBSP can estimate the density of large volume data in high dimensional space with short computation time and high accuracy.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070152615
http://hdl.handle.net/11536/76463
顯示於類別:畢業論文