標題: 使用單一模組轉換做巢狀迴圈具相依性之切割
Dependent Partitioning of Nested Loops by unimodular transformations
作者: 劉懿徵
Yih-Jeng Liu
劉啟民
Chi-Min Liu
資訊科學與工程研究所
關鍵字: 巢狀迴圈切割;平行處理;單一模組轉換;巨量平行度;微量平行度;NestedLoopPartitioning;ParallelProcessing;unimodular Xmation; coarsest,finest-granularity parallelism
公開日期: 1992
摘要: 對許多數值運算程式而言,巢狀迴圈的執行是相當耗時的。目前已有一些 方法用於獲得不同形態的平行度: 如 巨量平行度 (coarsest- granularity parallelism), 微量平行度 (finest- granularity parallelism) ,以及中階平行度 (intermediate-level parallelism)。在這之中, 針對巨量與微量的方法對許多程式而言適用 性不高。 群集 (grouping)方法的作用在於擷取中階平行度而且可當作 處理微量平行度的方法的一種延伸。在本論文中我們提出一個獲得中階平 行度的新方法。此方法以巨量平行度為出發點,因此可視為處理巨量平行 度的方法的一種延伸。透過放大一相依向量然後做單一模組轉換將巢狀迴 圈切割成一群具相依性的集合之後,我們能獲得較具彈性的集合個數而且 集合間的平行度亦被考量。我們討論了如何判斷一個相依向量是否適合被 利用來增加集合個數以及透過此相依向量能獲得的集合個數為何等問題, 同時亦提出一個能找到適當切割的程序。由於在此程序中對於每一種可能 的切割都必須執行一組單一模組轉換,因此我們提出能節省某些單一模組 轉換所需時間的方法。具有相依關係的集合之間有資料傳輸的需要。這些 資料傳輸所帶來的不利影響可透過與計算工作適當的重疊而被降低。 The execution of nested loops is most time-consuming for many numerical programs. There have been various approaches presented to extract various parallelism : the coarsest- granularity parallelism, the finest-granularity parallelism and the intermediate-level parallelism. Among them, the methods for finest-granularity and coarsest- granularity parallelism are good but not applicable in many programs. Grouping methods are developed for extracting the intermediate-level parallelism and can be considered as the extension of the methods for the finest -granularity parallelism. In this thesis,we present a new method to extract the intermediate-level parallelism. This method approaches the intermediate-level parallelism from the coarsest- granularity parallelism and can be considered as a extension of the methods for the coarsest-granularity parallelism. The presented method is to partition a nested loop into sets with dependent relation. The method is accomplished through unimodular transformation by lengthening one of the dependent vectors in a nested loop. The number of sets we can get is flexible and has considerations to the parallelism among sets. We show the method to check whether one dependent vector can be used to increase the number of partitioned sets, and the attainable number of partitioned sets one can get through this dependent vector. We propose a procedure to guide the process for finding a suitable partition. Because, in this procedure, one set of unimodular transformations is performed for each possible partition, a simple scheme is also provided to save the time for some sets of unimodular transformations. Data communication is needed between dependent sets.The communication overhead can be decreased due to a suitable overlap between computation and communication.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT810392078
http://hdl.handle.net/11536/56814
顯示於類別:畢業論文