完整後設資料紀錄
DC 欄位語言
dc.contributor.authorHuang, Ming Hsiangen_US
dc.contributor.authorYang, Wuuen_US
dc.date.accessioned2020-10-05T01:59:43Z-
dc.date.available2020-10-05T01:59:43Z-
dc.date.issued1970-01-01en_US
dc.identifier.issn0038-0644en_US
dc.identifier.urihttp://dx.doi.org/10.1002/spe.2868en_US
dc.identifier.urihttp://hdl.handle.net/11536/154851-
dc.description.abstractOpenACC is a directive-based programming model which allows programmers to write graphic processing unit (GPU) programs by simply annotating parallel loops. However, OpenACC has poor support for irregular nested parallel loops, which are natural choices to express nested parallelism. We propose PFACC, a programming model similar to OpenACC. PFACC directives can be used to annotate parallel loops and to guide data movement between different levels of memory hierarchy. Parallel loops can be arbitrarily nested or be placed inside functions that would be (possibly recursively) called in other parallel loops. The PFACC translator translates C programs with PFACC directives into CUDA programs by inserting runtime iteration-sharing and memory allocation routines. The PFACC runtime iteration-sharing routine is a two-level mechanism. Thread blocks dynamically organize loop iterations intobatchesand execute the batches in a depth-first order. Different thread blocks share iterations among one another with an iteration-stealing mechanism. PFACC generates CUDA programs with reasonable memory usage because of the depth-first execution order. The two-level iteration-sharing mechanism is implemented purely in software and fits well with the CUDA thread hierarchy. Experiments show that PFACC outperforms CUDA dynamic parallelism in terms of performance and code size on most benchmarks.en_US
dc.language.isoen_USen_US
dc.subjectdynamic schedulingen_US
dc.subjectGPGPUen_US
dc.subjectirregular parallelismen_US
dc.subjectnested parallelismen_US
dc.subjectOpenACCen_US
dc.subjectparallel programming modelen_US
dc.subjectPFACCen_US
dc.titlePFACC: An OpenACC-like programming model for irregular nested parallelismen_US
dc.typeArticleen_US
dc.identifier.doi10.1002/spe.2868en_US
dc.identifier.journalSOFTWARE-PRACTICE & EXPERIENCEen_US
dc.citation.spage0en_US
dc.citation.epage0en_US
dc.contributor.department資訊工程學系zh_TW
dc.contributor.departmentDepartment of Computer Scienceen_US
dc.identifier.wosnumberWOS:000546570800001en_US
dc.citation.woscount0en_US
顯示於類別:期刊論文