標題: | Partial Flattening: A Compilation Technique for Irregular Nested Parallelism on GPGPUs |
作者: | Huang, Ming-Hsiang Yang, Wuu 交大名義發表 National Chiao Tung University |
公開日期: | 2016 |
摘要: | Supporting irregular nested parallelism on modern GPUs requires much effort. One should distribute the parallel tasks evenly while preserving reasonable memory usage. Moreover, the task distribution should also fit the thread hierarchy of the underlying GPU to fully exploit its computing power. We propose partial flattening, an automatic code transformation which translates annotated C programs to CUDA kernels. Thread blocks are treated as flat SIMT processors. Iterations are dynamically organized into batches. Batches are executed in a sequential (depth-first) order. A kernel is treated as multiple independent SIMT processors with an additional task-stealing mechanism. Partial flattening allows easy expression of nested parallelism and synchronization by annotating nested parallel loops or parallel-recursive calls, while preserving reasonable memory usage by the depth-first execution order. Our 2-level task distribution scheme does not need special hardware support, and fits well with the CUDA thread hierarchy. Experiments show that partial flattening outperforms NESL significantly in most benchmarks, and obtains 2.15x and 67x speedup over CUDA dynamic parallelism in Quicksort and the Bron-Kerbosch algorithm, respectively. |
URI: | http://dx.doi.org/10.1109/ICPP.2016.70 http://hdl.handle.net/11536/136474 |
ISBN: | 978-1-5090-2823-8 |
ISSN: | 0190-3918 |
DOI: | 10.1109/ICPP.2016.70 |
期刊: | PROCEEDINGS 45TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - ICPP 2016 |
起始頁: | 552 |
結束頁: | 561 |
Appears in Collections: | Conferences Paper |