Partial Flattening: A Compilation Technique for Irregular Nested Parallelism on GPGPUs

doi:10.1109/ICPP.2016.70

標題:	Partial Flattening: A Compilation Technique for Irregular Nested Parallelism on GPGPUs
作者:	Huang, Ming-Hsiang Yang, Wuu 交大名義發表 National Chiao Tung University
公開日期:	2016
摘要:	Supporting irregular nested parallelism on modern GPUs requires much effort. One should distribute the parallel tasks evenly while preserving reasonable memory usage. Moreover, the task distribution should also fit the thread hierarchy of the underlying GPU to fully exploit its computing power. We propose partial flattening, an automatic code transformation which translates annotated C programs to CUDA kernels. Thread blocks are treated as flat SIMT processors. Iterations are dynamically organized into batches. Batches are executed in a sequential (depth-first) order. A kernel is treated as multiple independent SIMT processors with an additional task-stealing mechanism. Partial flattening allows easy expression of nested parallelism and synchronization by annotating nested parallel loops or parallel-recursive calls, while preserving reasonable memory usage by the depth-first execution order. Our 2-level task distribution scheme does not need special hardware support, and fits well with the CUDA thread hierarchy. Experiments show that partial flattening outperforms NESL significantly in most benchmarks, and obtains 2.15x and 67x speedup over CUDA dynamic parallelism in Quicksort and the Bron-Kerbosch algorithm, respectively.
URI:	http://dx.doi.org/10.1109/ICPP.2016.70 http://hdl.handle.net/11536/136474
ISBN:	978-1-5090-2823-8
ISSN:	0190-3918
DOI:	10.1109/ICPP.2016.70
期刊:	PROCEEDINGS 45TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - ICPP 2016
起始頁:	552
結束頁:	561
Appears in Collections:	Conferences Paper