標題: 異質多核心平台實現巨量資料的貝氏序向切割之分析研究
Bayesian Sequential Partitioning for Big Data Analysis on Heterogeneous Manycore Systems
作者: 李坤駿
Li, Kun-Chun
賴伯承
Lai, Bo-Cheng
電子工程學系 電子研究所
關鍵字: 機器學習;圖形運算;異質多核心;machine learning;graphic processing unit;heterogeneous platform
公開日期: 2013
摘要: 在計算機科技的領域裡面,如何有效地並且及時低從巨量資料之中探取有意義的資訊已經是一個重要而且任務。一種有效率的方式去探勘全面性的資料特性是高維度下的密度分布估計。對於高維度的巨量資料而言,貝氏序向切割 是一種有效率的密度估計演算法。然而 在貝氏序向切割的演算法中對於運算量的需求是相當可觀。其主要運算是來自於其需要對大量資料計數以及處理負責雜的統計上的數學運算。本篇論文對於貝氏序向切割演算法提出一個高效能的異質多核心系統架構的設計。並且對於演算法和資料結構提出了一連串實作會用到的技術。經由本篇所提出來的方法,貝氏序向切個的執行時間已經大幅改善,也解決其運算上的問題。經由本篇的設計,貝氏序向切割可以大大加速,比起高端的電腦伺服器運算時間而言,整體可加速高達到155.3倍。
Uncovering information from the large volume of data in a timely manner has been an imperative task in the next wave of computing technologies. Estimating the density distribution of the high dimensional data samples is an effective method to comprehend the characteristics of the data space. Bayesian Sequential Partitioning (BSP) is a statistically effective density estimation algorithm for high dimensional data. However, BSP is computationally expensive due to complex statistical model, and data intensive when counting the large volume samples. This thesis proposes a high performance design of BSP by leveraging the powerful computation capability of a heterogeneous many-core system. A series of techniques are implemented on both algorithm flow and data management policies. With the proposed approaches, the performance bottleneck is alleviated and the runtime is significantly improved. The overall speedup of the BSP analysis on a heterogeneous many-core system can reach up to 155.3x, when compared with the reference design on a high-end CPU.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070050283
http://hdl.handle.net/11536/74186
Appears in Collections:Thesis