標題: AdaBoost物件偵測演算法之異質運算加速研究
An AdaBoost Object Detection Algorithm for Heterogeneous Computing with OpenCL
作者: 鄭秉揚
Cheng, Bing-Yang
郭峻因
Guo, Jiun-In
電子工程學系 電子研究所
關鍵字: 分類器;繪圖處理器;異質系統架構;物件偵測;AdaBoost;GPU;HSA;Object detection
公開日期: 2013
摘要: 物件偵測屬於電腦視覺中相當重要的領域,在許多應用或產品裡都會發現它的存在,如人臉偵測、行人偵測,然而高偵測正確率總伴隨犧牲速度和效能,其中一個解決方法是利用硬體實現提升速度和減少耗能,但是無法彈性針對演算法適當調整或更新。因此,本文提出利用異質系統架構(HSA)中CPU與GPU的異質多核心平台進行物件偵測加速,以達即時實現之效能需求。
本論文中所探討的物件偵測方法透過AdaBoost演算法與矩形特徵訓練分類器來完成,其中,「特徵比對」佔AdaBoost演算法整體運算時間98%以上,且難以純CPU的方式計算達到即時運算要求。因此,本論文主要針對特徵比對進行加速。根據AdaBoost演算法的特性,主要有兩個問題影響移植到GPU後之執行效率:即偵測視窗工作量不平衡以及尺度工作量不平衡,本論文提出尺度平行化技術與動態階段分配技術,根據AdaBoost演算法的特性與CPU/GPU即時工作量來動態分配工作量來解決上述問題。
本論文所提出的方法妥善運用CPU與GPU之架構特性與運算資源進行加速運算,有效優化AdaBoost演算法進行平行化運算,解決AdaBoost演算法以GPU加速時所面臨到的效能瓶頸問題,其效能在AMD A10-7850K平台下可達到D1 @44fps。
Object detection is an important issue in computer vision. Detecting specific object can be applied into various kinds of applications, e.g. face detection, pedestrian detection, and so on. However, high accuracy detection rate is always accompanied with high computational complexity that might lead to poor performance in implementation. Hardware implementation is one of the solutions to improve processing performance significantly, but it is lack of flexibility if we want to update the algorithm. Thus, in this thesis we have proposed a parallel algorithm for Adaboost to be implemented on a Heterogeneous System Architecture (HSA) consisting of multiple CPU and GPU cores.
AdaBoost classification with Haar-like features is used in the proposed algorithm for object detection. Feature calculation in AdaBoost is the most time-consuming part of the algorithm, which occupies over 98% of the computation and cannot reach real-time processing with CPU computing only. Thus, we aim to accelerate the feature calculation in Adaboost through exploting both CPU and GPU computing resources. Based on the characteristics in AdaBoost algorithm, there are two problems that influence the performance on GPU. One is windows load unbalance problem and the other one is scale load unbalance problem. Thus, three solutions to overcome those two problems are proposed. They are scale parallelizing, stage parallelizing, and system dynamical partition. With these three solutions, the proposed system is able to achieve D1 video @ 44 fps on AMD A10-7850K processor.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070150261
http://hdl.handle.net/11536/75535
Appears in Collections:Thesis