利用機器學習方法改進在多圖形處理裝置平台上 基於歷史資訊的工作排程方法

标题:	利用机器学习方法改进在多图形处理装置平台上基于历史资讯的工作排程方法 Improving History-based Task Scheduling using Machine Learning Method on Multiple-GPU Platforms
作者:	蔡也寜游逸平 Tsai, Yeh-Ning Yi, Ping-You 资讯科学与工程研究所
关键字:	通用图形处理器;OpenCL;装置虚拟化;工作排程;机器学习;GPGPU;OpenCL;device abstraction;task scheduling;machine learning
公开日期:	2016
摘要:	近年来，使用多个图形处理器（GPU）来加速应用程式的兴趣增加了。许多抽象技术也被提出来简化工作，诸如在多个装置之间的选择和资料传输，以使程式撰写者利用多个GPU的计算能力。一个设计良好的架构（称为VirtCL）实现了一个运行平台，该架构提供了OpenCL设备的抽象化，在程式撰写者和本机OpenCL运行系统之间运作来减少编程负担。并将多个图形处理器抽象为单个虚拟装置，且在多个装置之间调度计算和沟通。 VirtCL利用历史资讯来调度程序，且考量装置间的沟通及壅塞状态。然而，目前排程程序有两个问题：（1）VirtCL假定所有底层GPU装置具有相同的计算能力和（2）VirtCL假设在Kernel的执行时间和计算资料大小之间存在线性关系。事实上，Kernel的执行时间不仅受计算资料大小影响，还会受Kernel的特性影响。这两个假设可能导致不平衡调度，特别是当底层装置的计算能力不同时。因此，本论文提出了一种利用机器学习来预测Kernel执行时间的方法，该方法考虑了Kernel的特性和底层装置的计算能力。本模型的构建包括两个阶段：（1）聚类和（2）分类。在聚类阶段，收集用来训练的Kernel资料依在不同GPU装置上的行为表现被聚集到同一个类别。在分类阶段，构建分类器以将Kernel的特征映射到聚类阶段产生的类别。当模型建立了，它将在运行时被用作预测模型，该模型的输入为Kernel的特征，并输出所预测的Kernel行为;此信息将与Kernel的执行历史纪录一起使用以预测Kernel的执行时间。通过更精确的执行时间预测，VirtCL中的排程器可以更好地决定Kernel在多GPU平台上的装置选择。初步实验结果表明，所提出的预测模型在Kernel的执行时间上具有平均31.5%误差的精确度，利用此更准确的预测，对于整体VirtCL平台，总体吞吐量增加了平均24%。 The interest in using multiple graphics processing units (GPUs) to accelerate applications has increased in recent years. Several abstraction techniques had been proposed to ease the work, such as device selection and data transfer among multiple devices, for programmers to utilize the computing power of multiple GPUs. One well-designed framework, called VirtCL, implements a run-time system that provides a high-level abstraction of OpenCL devices so as to reduce the programming burden by acting as a layer between the programmer and the native OpenCL run-time system. The layer abstracts multiple devices into a single virtual device and schedules computations and communications among the multiple devices. VirtCL implements a history-based scheduler that schedules kernel tasks in a contention- and communication-aware manner. However, the scheduler have two problems: (1) VirtCL assumes that all the underlying GPU devices have the same compute capability and (2) VirtCL assumes that there exists a linear relationship between the execution time of a kernel and the input data size of the kernel. In fact, the execution time of a kernel is influenced by not only the input data size but also the characteristics of the kernel. These two assumptions might result in imbalance schedules, especially when the compute capabilities of the underlying devices vary. Therefore, in this paper, we propose a method for predicting the execution time of a kernel based on a machine learning model, which takes the characteristics of a kernel and the compute capability of underlying devices into consideration. The model construction consists of two phases: (1) clustering and (2) classification. In the phase of clustering, training kernel datasets are clustered to form groups of kernels with similar performance scaling behavior across different GPU devices. In the phase of classification, a classifier is built to map the features of a kernel to a cluster. Once the model is built, it will be used at runtime as a prediction model which takes the features of a kernel as inputs and outputs the predicted scaling behavior for the kernel; this information will be combined with the execution history of the kernel to predict the execution time of the kernel. With the more accurate execution time prediction, the scheduler in VirtCL can thus make a better decision on selecting a device for a kernel on multi-GPU platforms. The preliminary experimental results indicates that the proposed prediction model had an average of 31.5% prediction error on the execution times of kernels, and with the more accurate prediction, the overall throughput was increased by an average of 24% for synthetic workload traces.
URI:	http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070356023 http://hdl.handle.net/11536/140123
显示于类别：	Thesis