标题: 在多处理机系统上的执行时期平行化方法
An Efficient Run-Time Parallelizing Method for Multiprocessor Systems
作者: 谢明辉
Hsieh, Ming-Huei
曾宪雄
Shian-Shyong Tseng
资讯科学与工程研究所
关键字: 执行时期;波前;平行度;run-time;wavefront;parallelism
公开日期: 1995
摘要: 回圈在一程式中存在有大量的平行度,为了将此程式平行化,平行编译器
利用静态资料相依性分析来获得回圈的平行度。然而,有些回圈则无法于
编译时期取得资料相依性的资讯。例如,在稀疏矩阵计算上,阵列述语内
通常包含了间接阵列或函式,而无法利用静态资料相依性分析。故便保守
的将程式循序的执行,而牺牲了潜在的平行度。因此,在此论文中则提出
了一个两阶段 (侦测阶段及执行阶段) 的执行时期平行化方法于执行时期
撷取出回圈中潜在的平行度。侦测阶段经由建立一DEF-USE表而决定出可
平行执行的回圈轮替集合-波前,此外,此侦测阶段本身可以被完全的平
行化以减少因决定波前所照成的额外负担。而经改良的执行阶段则根据波
前来执行回圈并且使用auto-adapted函式来获得合适的Thread数量而非传
统固定的指定Thread数量。实验的结果显示,这个平行侦测演算法能处理
较复杂的资料相依性而且能明显缩短本身执行时间。此外,在执行阶段所
利用的新策略能提高整个执行时期平行化的效率并且增加多处理机系统的
利用度。
Loop-level parallelism is the most common resource to be
exploited by parallelizing compiler. To parallelize a sequential
loop, a parallelizing compiler must compute a parallel schedule
of the iterations based on a static data dependenceanalysis at
compile-time. Some loops, however, may contain parallelism not
detectable in this way. For example, insparse matrix
computations, array subscripts often involve indirection arrays
and thus defy static analysis. In conservatively, the loop
iterations in such examples will be performed sequentially.
Motivated by these concerns, a run-time technique based on
inspector-executor scheme is proposed for finding available
parallelism on loops in this thesis. Our inspector can determine
the wavefronts by building DEF-USE table. Additionally, the
inspector is fully parallel without any synchronization for
reducing overhead that indicates the wavefronts. Our improved
executor performs the loop iterations concurrently for each
wavefront in a loop by using auto-adapted function to get a
tailored thread number rather than using fixed thread number.
Experimental results show that our new parallel inspector
algorithm can handle complex data dependency patterns that
cannot be performed by the previousresearches and reduce itself
running time obviously. Besides, the new strategyfor executor
can also achieve high system utilization and improve the
performance of run-time parallelization.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT840394006
http://hdl.handle.net/11536/60445
显示于类别:Thesis