標題: 通用圖形處理器裝置之抽象化與資源管理研究
VirtCL: A Framework for OpenCL Device Abstraction and Management
作者: 吳翰融
Wu, Han-Jung
游逸平
You, Yi-Ping
資訊科學與工程研究所
關鍵字: 通用繪圖處理器;OpenCL;通用繪圖處理器排程;GPGPU abstraction;GPGPU scheduling;OpenCL
公開日期: 2013
摘要: 最近幾年在高效能運算(HPC)領域中,運用異質多核心加速原有應用程式,已成為平行運算之顯學。其中具有多繪圖處理器之平台,俗稱桌上型超級電腦,因具有便宜,高效能的優點而眾所採用。然而現今的異質多核心運算程式語言模型,例如OpenCL,並不具針對有多裝置平台的運算資源管理,當使用此種平台時,程式設計師必然需實作裝置資源管理,例如裝置排程,記憶體一致性維持等。又因排程所需的資訊(例如裝置忙碌程度)不為設計者所知,程式設計師所選之運算裝置不一定為最佳選擇。因此,本論文提出一OpenCL裝置抽象化及資源管理架構VirtCL,此架構分為兩部分:前端之OpenCL相容程式庫,目的在提供OpenCL應用程式與後端資源管理平台之抽象化裝置及介面。後端之資源管理平台,目的在滿足前端提出之通用計算之需求,以及實作記憶體管理和裝置管理排程。我們提出一個COST排程器,期根據kernel運算時間之歷史結果以及OpenCL指令在各裝置之所需等待時間,為使用者選出一最佳裝置。實驗結果顯示,VirtCL架構實作之抽象化界面具有相當小的成本(平均10.44 %),我們所設計的排程器,能夠依裝置數量之不同,提供接近相應之效能。最後的排程模擬測試顯示,我們的排程器在系統負載高時,相較於最短等候列排程器,能夠提供2倍之效能。
Using multiple GPU devices to accelerate applications has become a growing area of interest in recent years. However, the existing heterogeneous programming models, such as OpenCL, abstract details of GPU devices at per device level and require programmers to explicitly schedule their kernel tasks on a system equipped with multiple GPU devices. Unfortunately, in the case of multiple applications running on a multi-GPU system, applications may compete for certain GPU device(s), say the first device, while some other GPU devices are left unused. Moreover, the distributed memory model (each device having its own memory space) defined in OpenCL complexes the memory management among multiple GPU devices. In this thesis, we propose a framework (called VirtCL), which acts as a layer between programmers and the native OpenCL runtime system for abstracting multiple devices into a single virtual device and scheduling computations and communications among the multiple devices, thereby alleviating programmers' burden. VirtCL comprises two main components: a front-end library, which exposes primary OpenCL APIs and the virtual device, and a back-end runtime system (called CLDaemon) for scheduling and dispatching kernels based on a history-based kernel scheduler. The front-end library forwards computation requests to the back-end CLDaemon, and CLDaemon then schedules and dispatches the requests. We also propose a history-based scheduler COST which is able to schedule kernels in a contention- and data-aware fashion. The experimental results show that the VirtCL framework outperformed the native OpenCL runtime system for most benchmarks in the Rodinia benchmark suite since the abstraction layer eliminated the heavy-weight initialization of OpenCL contexts. The overhead analysis shows that the framework has small overhead (10.44\% on average). The throughput of the proposed framework is measured under various kernel scheduling policies with real-world application clsurf and trace-based simulation.The result shows that the proposed scheduler beat native OpenCL and other schedulers when system load is very large, our proposed also enabled scalability for applications running on multi-GPU systems.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079955581
http://hdl.handle.net/11536/73741
Appears in Collections:Thesis