Partial Flattening: A Compilation Technique for Irregular Nested Parallelism on GPGPUs

doi:10.1109/ICPP.2016.70

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	Huang, Ming-Hsiang	en_US
dc.contributor.author	Yang, Wuu	en_US
dc.date.accessioned	2017-04-21T06:48:13Z	-
dc.date.available	2017-04-21T06:48:13Z	-
dc.date.issued	2016	en_US
dc.identifier.isbn	978-1-5090-2823-8	en_US
dc.identifier.issn	0190-3918	en_US
dc.identifier.uri	http://dx.doi.org/10.1109/ICPP.2016.70	en_US
dc.identifier.uri	http://hdl.handle.net/11536/136474	-
dc.description.abstract	Supporting irregular nested parallelism on modern GPUs requires much effort. One should distribute the parallel tasks evenly while preserving reasonable memory usage. Moreover, the task distribution should also fit the thread hierarchy of the underlying GPU to fully exploit its computing power. We propose partial flattening, an automatic code transformation which translates annotated C programs to CUDA kernels. Thread blocks are treated as flat SIMT processors. Iterations are dynamically organized into batches. Batches are executed in a sequential (depth-first) order. A kernel is treated as multiple independent SIMT processors with an additional task-stealing mechanism. Partial flattening allows easy expression of nested parallelism and synchronization by annotating nested parallel loops or parallel-recursive calls, while preserving reasonable memory usage by the depth-first execution order. Our 2-level task distribution scheme does not need special hardware support, and fits well with the CUDA thread hierarchy. Experiments show that partial flattening outperforms NESL significantly in most benchmarks, and obtains 2.15x and 67x speedup over CUDA dynamic parallelism in Quicksort and the Bron-Kerbosch algorithm, respectively.	en_US
dc.language.iso	en_US	en_US
dc.title	Partial Flattening: A Compilation Technique for Irregular Nested Parallelism on GPGPUs	en_US
dc.type	Proceedings Paper	en_US
dc.identifier.doi	10.1109/ICPP.2016.70	en_US
dc.identifier.journal	PROCEEDINGS 45TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - ICPP 2016	en_US
dc.citation.spage	552	en_US
dc.citation.epage	561	en_US
dc.contributor.department	交大名義發表	zh_TW
dc.contributor.department	National Chiao Tung University	en_US
dc.identifier.wosnumber	WOS:000387089600063	en_US
dc.citation.woscount	0	en_US
顯示於類別：	會議論文