標題: | CUDT: 以CUDA為基礎之決策樹演算法 CUDT: A CUDA Based Decision Tree Algorithm |
作者: | 邱俊傑 Chiu, Chun-Chieh 袁賢銘 Yuan, Shyan-Ming 資訊科學與工程研究所 |
關鍵字: | GPGPU;CUDA;Decision Tree;Classification;GPGPU;CUDA;Decision Tree;Classification |
公開日期: | 2010 |
摘要: | 分類(classification)在機器學習(Machine Learning)和資料探勘(Data mining)中是一個很重要的議題。其中,決策樹被廣為運用在這個領域中,然而在現實生活中,資料多為高維度且資料量非常鉅量,在大量的資料下,整個決策樹建立的時間大多消耗在計算上,也就是這是一個運算密集的問題,也因此相當多的研究專注於加速分類模型的建立。
圖形處理器(GPU)是專門為處理圖形而設計的,因為影像的處理具高度平行化的特性,造就GPGPU 的產生; 許多研究專注於使用GPU 來處理非圖形處理的大量運算,其加速的效果非常驚人,因此也有著極高的性價比。而CUDA(Compute Unified Device Architecture)即為NVIDIA 所提出的GPGPU 的方案。
本論文基於NVIDIA’s CUDA 提出一個新的決策樹的演算法,在此架構中CPU 負責流程處理,而GPU 負責處理大量資料的運算。我們與著名的資料探勘軟體Weka 和SPRINT來做比較,結果顯示我們的CUDT 比起Weka 在效能上有6~5x 倍的加速,較大的資料上比起SPRINT 我們在效能上有18 倍的加速。 Classification is an important issue both in Machine Learning and Data Mining. Decision tree is one of the famous classification models. In the reality case, the dimension of data is high and the data size is huge. Building a decision in large data base cost much time in computation. It is a computationally expensive problem. GPU is a special design processor of graphic. The highly parallel features of graphic processing made today’s GPU architecture. GPGPU means use GPU to solve non-graphic problems which need amounts of computation power. Since the high performance and capacity/price ratio, many researches use GPU to process lots computation. Compute Unified Device Architecture (CUDA) is a GPGPU solution provided by NVIDIA. This paper provides a new parallel decision tree algorithm base on CUDA. The algorithm parallel computes building phase of decision tree. In our system, CPU is responsible for flow control and GPU is responsible for computation. We compare our system to the Weka-j48 algorithm. The result shows out system is 6~5x times faster than Weka-j48. Compare with SPRINT on large data set, our CUDT has about 18 times speedup. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT079855558 http://hdl.handle.net/11536/48294 |
顯示於類別: | 畢業論文 |