标题: | 在多处理机系统上的执行时期平行化方法 An Efficient Run-Time Parallelizing Method for Multiprocessor Systems |
作者: | 谢明辉 Hsieh, Ming-Huei 曾宪雄 Shian-Shyong Tseng 资讯科学与工程研究所 |
关键字: | 执行时期;波前;平行度;run-time;wavefront;parallelism |
公开日期: | 1995 |
摘要: | 回圈在一程式中存在有大量的平行度,为了将此程式平行化,平行编译器 利用静态资料相依性分析来获得回圈的平行度。然而,有些回圈则无法于 编译时期取得资料相依性的资讯。例如,在稀疏矩阵计算上,阵列述语内 通常包含了间接阵列或函式,而无法利用静态资料相依性分析。故便保守 的将程式循序的执行,而牺牲了潜在的平行度。因此,在此论文中则提出 了一个两阶段 (侦测阶段及执行阶段) 的执行时期平行化方法于执行时期 撷取出回圈中潜在的平行度。侦测阶段经由建立一DEF-USE表而决定出可 平行执行的回圈轮替集合-波前,此外,此侦测阶段本身可以被完全的平 行化以减少因决定波前所照成的额外负担。而经改良的执行阶段则根据波 前来执行回圈并且使用auto-adapted函式来获得合适的Thread数量而非传 统固定的指定Thread数量。实验的结果显示,这个平行侦测演算法能处理 较复杂的资料相依性而且能明显缩短本身执行时间。此外,在执行阶段所 利用的新策略能提高整个执行时期平行化的效率并且增加多处理机系统的 利用度。 Loop-level parallelism is the most common resource to be exploited by parallelizing compiler. To parallelize a sequential loop, a parallelizing compiler must compute a parallel schedule of the iterations based on a static data dependenceanalysis at compile-time. Some loops, however, may contain parallelism not detectable in this way. For example, insparse matrix computations, array subscripts often involve indirection arrays and thus defy static analysis. In conservatively, the loop iterations in such examples will be performed sequentially. Motivated by these concerns, a run-time technique based on inspector-executor scheme is proposed for finding available parallelism on loops in this thesis. Our inspector can determine the wavefronts by building DEF-USE table. Additionally, the inspector is fully parallel without any synchronization for reducing overhead that indicates the wavefronts. Our improved executor performs the loop iterations concurrently for each wavefront in a loop by using auto-adapted function to get a tailored thread number rather than using fixed thread number. Experimental results show that our new parallel inspector algorithm can handle complex data dependency patterns that cannot be performed by the previousresearches and reduce itself running time obviously. Besides, the new strategyfor executor can also achieve high system utilization and improve the performance of run-time parallelization. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT840394006 http://hdl.handle.net/11536/60445 |
显示于类别: | Thesis |