平行編譯器之 DOACROSS 迴圈平行化

標題:	平行編譯器之 DOACROSS 迴圈平行化 DOACROSS Loops Parallelization for Parallelizing Compilers
作者:	高世宏 Shih-Hung Kao 曾憲雄 Shian-Shyong Tseng 資訊科學與工程研究所
關鍵字:	平行編譯器; 平行度; 執行時期的平行化;資料相依性;parallelizing compiler; parallelism; run-time parallelization; data dependence
公開日期:	1994
摘要:	在平行編譯器的偵測上,迴圈存在大量的平行度。目前現存的平行編譯器當中，大部分只針對 DOALL 這類型的迴圈來做處理。然而 DOACROSS 這類型的迴圈也存在潛在豐富的平行度，但是卻被許多的平行編譯器所忽略。在這篇論文當中，我們針對 DOACROSS 迴圈的平行化提出一個架構。在這個架構之下，我們將 DOACROSS 迴圈的平行化分成兩個部分：編譯時期的平行化、執行時期的平行化。如果一個 DOACROSS 迴圈，具有恆定一致的資料相依性，透過適當的同步協調，可以成功的將這類型的 DOACROSS 迴圈平行化。然而，假使我們無法在編譯時期獲得有關迴圈中資料相依性的訊息，或是資料相依性是屬於不規則性，都將使得迴圈的前置處理器做保守的選擇，犧牲潛在的平行度。針對這樣的情形，我們提出了一個執行時期方法來處理這樣的迴圈。這個方法是基於一個兩階段架構 (偵測階段、執行階段)。我們針對偵測階段提出一個通用的演算法來改善迴圈的排程問題。實驗的結果顯示，這個通用的演算法可以處理任何資料相依性，而且可以很有效率的執行。此外，實驗結果也顯示，當一個迴圈的工作負載不一致時，對於執行階段可能要考慮不同的排程策略。 Loop-level parallelism is the most common resource to be exploited by parallelizing compiler. On most existing parallelizing compiler, only DOALL loops parallelization are supported. However, DOACROSS loops which are ignored by most current parallelizing compiler exist plentiful parallelism. In this thesis, a DOACROSS loops parallelization model is proposed. The parallelization for DOACROSS loops is divided into two parts: compile time parallelization and run-time parallelization. DOACROSS loop with constant uniform dependence distance can be parallelized by proper synchronizations. It is also found that parallelizing DOACROSS loops can obtain obvious speedup. However, if the dependence distance is non-uniform or array index is an indexing function, these will make data dependence test conservative. A run-time method is proposed to handle such loops. This method is based on insp/exec loop transformation (inspector phase and executor phase). We propose a general algorithm for the inspector phase to improve the capability to solve loop scheduling problem. Our algorithm can determine the wavefronts of a loop with any complex array reference relations by building DEF-USE table. The experimental results show that the new algorithm can handle any complex data dependence pattern which cannot be handled by any other previous research, and also reveals that if the input loop doesn't have uniform workload, the scheduling should be considered. Furthermore, the efficiency of the insp/exec method is also discussed.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT830394049 http://hdl.handle.net/11536/59072
Appears in Collections:	Thesis