標題: 在通用圖型處理器上考量記憶體單元的執行排程
Memory Contention-Aware Warp Scheduler for GPGPUs
作者: 劉亞傑
Liou Ya-Jie
游逸平
You Yi-Ping
資訊科學與工程研究所
關鍵字: 通同圖型處理器;執行排程;執行緒平行;記憶體單元;thread scheduling;contention-aware;thread-level parallelism;GPUs;CUDA
公開日期: 2014
摘要: 近年來圖型處理器 (GPU) 已經不只被使用在塗行處理的領域上,圖型處理器因為硬體架構上特性,相當適合平行化程式的執行,通用圖型處理器 (GPGPU) 就是可以使用在一般運算功能上的圖型處理器,而通用圖型處理器的執行效能就成為一個重要的議題。 本研究主要目的為提供一個新的通用圖型處理器上的排程演算法,來解決程式執行時可能產生的記憶體單元不足的問題。我們依照通用圖型處理器上的記憶體單元狀態來進行排程,如果記憶體單元目前可以執行記憶體指令,主動選擇記憶體指令執行,如果記憶體單元目前皆處於忙碌,便選入一個計算指令執行,藉此減少等待記憶體單元的時間,改善執行的效能。
Modern general-purpose computation on graphics processing units (GPGPUs) explore parallelism in applications by building massively parallel architecture and applying multithreading techniques to hide the instruction and memory latencies. Such architectures become increasingly popular for parallel applications using CUDA/OpenCL programming languages. In this paper, we investigate thread (warp) scheduling algorithms on such highly-threaded GPGPUs. The traditional round-robin scheduling schemes are inefficient in handling instruction execution and memory accesses with disparate latencies. We introduce a memory contention-aware warp scheduler which schedules warps by checking the status of memory unit: it schedules a memory instruction to execute whenever possible if the memory unit is available; if not, it intends to not schedule a memory instruction. This approach maximizes the utilization of the memory unit. Performance evaluations demonstrate that the proposed scheduler improved the execution times of programs from the NVIDIA SDK, the Rodinia benchmark suite, and the Parboil benchmark suite by 12.36%, 2.87%, and 2.77% over the fine-grained round-robin scheme, respectively.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070156089
http://hdl.handle.net/11536/76489
Appears in Collections:Thesis