標題: | 巢狀迴圈自動切割技術之研究 A Study of Automatic Partition Techniques for Nested Loops |
作者: | 李浩肇 Lee, Haw-Jaw 陳正 Cheng Chen 資訊科學與工程研究所 |
關鍵字: | 自動切割;Decomposition |
公開日期: | 1995 |
摘要: | 本論文的重點在探討巢狀迴圈的自動切割技術﹐即巢狀迴圈的計算分配與 資 料分配的方法。在NUMA(Non-Uniform Memory Access time)系 統﹐遠端的(Remote) 資料存取時間會比區域性的(Local)資料存取時間長 很多﹐所以迴圈的資料分配和 計算分配的方式會影響系統效能﹐兩者之 間息息相關。如何將一個程式的計算 (Computation)及資料(Data)作 適當的分配(Decomposition)到各個處理機上﹐在開 發平行度時﹐也能同 時考慮降低資料傳輸時間﹐以達到系統的最佳效能﹐是平行 編譯器的 一個重要課題。本論文基本上是以Q.Ning[Ning 95]的方法為藍本﹐改 善其在遇到無法找出Communication-Free分配方式時的方法。我們提出 Relax演 算法﹐定義相依比重(Dependence Weight)的觀念﹐先計算每 個迴圈的相依比重﹐ 然後依照比重由小至大的順序﹐依次將其產生的資 料區域性限制放鬆(Relax)﹐找 出Pipeline Execution、Pipeline Communication的分配方式。另外﹐我們也說明 如何估計執行時間﹐以評 估計算與資料分配的好壞。本論文所提出的切割方法已 經實作在一個 SUIF平行編譯研究環境[Stan 94]上,並以我們發展的平行模擬環 境 來評估其效果。初步評估顯示,我們的方法執行時間平均較[Ning95]改善 8.5%。 The purpose of this thesis is to study an automatic partition technique for nested loops, i.e., the computation and data decomposition method for nested loops. In the NUMA(Non-Uniform Memory Access time) system, the remote memory access time is longer than local one, the computation and data decomposition of the program will strongly impact the system performance, and they are 謻rrelated. Thus the partition, or decomposition, of a program's computation and data onto each processor of parallel machine, incasing the parallelism as well as decreasing the data communication time, in order to optimize the system performance, is a key issue of parallel compiler. This thesis is based on Q. Ning[Ning95]'s method. When the decomposition is not communication-free, we propose a relax algorithm to improve the [Ning95]'s method. We first compute the dependence weight, and relax the data locality constraints according the dependence weight in the decreasing order, to find out the better pipeline execution and pipeline comnication decomposition. We also describe how to estimate the execution time, to evaluate the decomposition result. This thesis's method has already implemented on the parallel compiler SUIF[Stan94], and evaluate the results on a parallel simulation environment. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT840392055 http://hdl.handle.net/11536/60400 |
Appears in Collections: | Thesis |