標題: 利用機器學習方法改進在多圖形處理裝置平台上 基於歷史資訊的工作排程方法
Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms
作者: 蔡也寜
游逸平
Tsai, Yeh-Ning
Yi, Ping-You
資訊科學與工程研究所
關鍵字: 通用圖形處理器;OpenCL;裝置虛擬化;工作排程;機器學習;GPGPU;OpenCL;device abstraction;task scheduling;machine learning
公開日期: 2016
摘要: 近年來,使用多個圖形處理器(GPU)來加速應用程式的興趣增加了。許多抽象技術也被提出來簡化工作,諸如在多個裝置之間的選擇和資料傳輸,以使程式撰寫者利用多個GPU的計算能力。一個設計良好的架構(稱為VirtCL)實現了一個運行平台,該架構提供了OpenCL設備的抽象化,在程式撰寫者和本機OpenCL運行系統之間運作來減少編程負擔。並將多個圖形處理器抽象為單個虛擬裝置,且在多個裝置之間調度計算和溝通。 VirtCL利用歷史資訊來調度程序,且考量裝置間的溝通及壅塞狀態。然而,目前排程程序有兩個問題:(1)VirtCL假定所有底層GPU裝置具有相同的計算能力和(2)VirtCL假設在Kernel的執行時間和計算資料大小之間存在線性關係。事實上,Kernel的執行時間不僅受計算資料大小影響,還會受Kernel的特性影響。這兩個假設可能導致不平衡調度,特別是當底層裝置的計算能力不同時。因此,本論文提出了一種利用機器學習來預測Kernel執行時間的方法,該方法考慮了Kernel的特性和底層裝置的計算能力。本模型的構建包括兩個階段:(1)聚類和(2)分類。在聚類階段,收集用來訓練的Kernel資料依在不同GPU裝置上的行為表現被聚集到同一個類別。在分類階段,構建分類器以將Kernel的特徵映射到聚類階段產生的類別。當模型建立了,它將在運行時被用作預測模型,該模型的輸入為Kernel的特徵,並輸出所預測的Kernel行為;此信息將與Kernel的執行歷史紀錄一起使用以預測Kernel的執行時間。通過更精確的執行時間預測,VirtCL中的排程器可以更好地決定Kernel在多GPU平台上的裝置選擇。初步實驗結果表明,所提出的預測模型在Kernel的執行時間上具有平均31.5%誤差的精確度,利用此更準確的預測,對於整體VirtCL平台,總體吞吐量增加了平均24%。
The interest in using multiple graphics processing units (GPUs) to accelerate applications has increased in recent years. Several abstraction techniques had been proposed to ease the work, such as device selection and data transfer among multiple devices, for programmers to utilize the computing power of multiple GPUs. One well-designed framework, called VirtCL, implements a run-time system that provides a high-level abstraction of OpenCL devices so as to reduce the programming burden by acting as a layer between the programmer and the native OpenCL run-time system. The layer abstracts multiple devices into a single virtual device and schedules computations and communications among the multiple devices. VirtCL implements a history-based scheduler that schedules kernel tasks in a contention- and communication-aware manner. However, the scheduler have two problems: (1) VirtCL assumes that all the underlying GPU devices have the same compute capability and (2) VirtCL assumes that there exists a linear relationship between the execution time of a kernel and the input data size of the kernel. In fact, the execution time of a kernel is influenced by not only the input data size but also the characteristics of the kernel. These two assumptions might result in imbalance schedules, especially when the compute capabilities of the underlying devices vary. Therefore, in this paper, we propose a method for predicting the execution time of a kernel based on a machine learning model, which takes the characteristics of a kernel and the compute capability of underlying devices into consideration. The model construction consists of two phases: (1) clustering and (2) classification. In the phase of clustering, training kernel datasets are clustered to form groups of kernels with similar performance scaling behavior across different GPU devices. In the phase of classification, a classifier is built to map the features of a kernel to a cluster. Once the model is built, it will be used at runtime as a prediction model which takes the features of a kernel as inputs and outputs the predicted scaling behavior for the kernel; this information will be combined with the execution history of the kernel to predict the execution time of the kernel. With the more accurate execution time prediction, the scheduler in VirtCL can thus make a better decision on selecting a device for a kernel on multi-GPU platforms. The preliminary experimental results indicates that the proposed prediction model had an average of 31.5% prediction error on the execution times of kernels, and with the more accurate prediction, the overall throughput was increased by an average of 24% for synthetic workload traces.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070356023
http://hdl.handle.net/11536/140123
Appears in Collections:Thesis