標題: | 在多處理機系統上的執行時期平行化方法 An Efficient Run-Time Parallelizing Method for Multiprocessor Systems |
作者: | 謝明輝 Hsieh, Ming-Huei 曾憲雄 Shian-Shyong Tseng 資訊科學與工程研究所 |
關鍵字: | 執行時期;波前;平行度;run-time;wavefront;parallelism |
公開日期: | 1995 |
摘要: | 迴圈在一程式中存在有大量的平行度,為了將此程式平行化,平行編譯器 利用靜態資料相依性分析來獲得迴圈的平行度。然而,有些迴圈則無法於 編譯時期取得資料相依性的資訊。例如,在稀疏矩陣計算上,陣列述語內 通常包含了間接陣列或函式,而無法利用靜態資料相依性分析。故便保守 的將程式循序的執行,而犧牲了潛在的平行度。因此,在此論文中則提出 了一個兩階段 (偵測階段及執行階段) 的執行時期平行化方法於執行時期 擷取出迴圈中潛在的平行度。偵測階段經由建立一DEF-USE表而決定出可 平行執行的迴圈輪替集合-波前,此外,此偵測階段本身可以被完全的平 行化以減少因決定波前所照成的額外負擔。而經改良的執行階段則根據波 前來執行迴圈並且使用auto-adapted函式來獲得合適的Thread數量而非傳 統固定的指定Thread數量。實驗的結果顯示,這個平行偵測演算法能處理 較複雜的資料相依性而且能明顯縮短本身執行時間。此外,在執行階段所 利用的新策略能提高整個執行時期平行化的效率並且增加多處理機系統的 利用度。 Loop-level parallelism is the most common resource to be exploited by parallelizing compiler. To parallelize a sequential loop, a parallelizing compiler must compute a parallel schedule of the iterations based on a static data dependenceanalysis at compile-time. Some loops, however, may contain parallelism not detectable in this way. For example, insparse matrix computations, array subscripts often involve indirection arrays and thus defy static analysis. In conservatively, the loop iterations in such examples will be performed sequentially. Motivated by these concerns, a run-time technique based on inspector-executor scheme is proposed for finding available parallelism on loops in this thesis. Our inspector can determine the wavefronts by building DEF-USE table. Additionally, the inspector is fully parallel without any synchronization for reducing overhead that indicates the wavefronts. Our improved executor performs the loop iterations concurrently for each wavefront in a loop by using auto-adapted function to get a tailored thread number rather than using fixed thread number. Experimental results show that our new parallel inspector algorithm can handle complex data dependency patterns that cannot be performed by the previousresearches and reduce itself running time obviously. Besides, the new strategyfor executor can also achieve high system utilization and improve the performance of run-time parallelization. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT840394006 http://hdl.handle.net/11536/60445 |
Appears in Collections: | Thesis |