标题: 平行编译器之 DOACROSS 回圈平行化
DOACROSS Loops Parallelization for Parallelizing Compilers
作者: 高世宏
Shih-Hung Kao
曾宪雄
Shian-Shyong Tseng
资讯科学与工程研究所
关键字: 平行编译器; 平行度; 执行时期的平行化;资料相依性;parallelizing compiler; parallelism; run-time parallelization; data dependence
公开日期: 1994
摘要: 在平行编译器的侦测上,回圈存在大量的平行度。目前现存的平行编译器
当中,大部分只针对 DOALL 这类型的回圈来做处理。然而 DOACROSS 这
类型的回圈也存在潜在丰富的平行度,但是却被许多的平行编译器所忽略
。在这篇论文当中,我们针对 DOACROSS 回圈的平行化提出一个架构。在
这个架构之下,我们将 DOACROSS 回圈的平行化分成两个部分:编译时期
的平行化、执行时期的平行化。如果一个 DOACROSS 回圈,具有恒定一致
的资料相依性,透过适当的同步协调 ,可以成功的将这类型的
DOACROSS 回圈平行化。然而,假使我们无法在编译时期获得有关回圈中
资料相依性的讯息,或是资料相依性是属于不规则性,都将使得回圈的前
置处理器做保守的选择,牺牲潜在的平行度。针对这样的情形,我们提出
了一个执行时期方法来处理这样的回圈。这个方法是基于一个两阶段架
构 (侦测阶段、执行阶段)。我们针对侦测阶段提出一个通用的演算法来
改善回圈的排程问题。实验的结果显示,这个通用的演算法可以处理任何
资料相依性,而且可以很有效率的执行。此外,实验结果也显示,当一个
回圈的工作负载不一致时,对于执行阶段可能要考虑不同的排程策略。
Loop-level parallelism is the most common resource to be
exploited by parallelizing compiler. On most existing
parallelizing compiler, only DOALL loops parallelization are
supported. However, DOACROSS loops which are ignored by most
current parallelizing compiler exist plentiful parallelism. In
this thesis, a DOACROSS loops parallelization model is
proposed. The parallelization for DOACROSS loops is divided
into two parts: compile time parallelization and run-time
parallelization. DOACROSS loop with constant uniform
dependence distance can be parallelized by proper
synchronizations. It is also found that parallelizing DOACROSS
loops can obtain obvious speedup. However, if the dependence
distance is non-uniform or array index is an indexing function,
these will make data dependence test conservative. A run-time
method is proposed to handle such loops. This method is based
on insp/exec loop transformation (inspector phase and executor
phase). We propose a general algorithm for the inspector phase
to improve the capability to solve loop scheduling problem.
Our algorithm can determine the wavefronts of a loop with any
complex array reference relations by building DEF-USE table.
The experimental results show that the new algorithm can handle
any complex data dependence pattern which cannot be handled by
any other previous research, and also reveals that if the input
loop doesn't have uniform workload, the scheduling should be
considered. Furthermore, the efficiency of the insp/exec
method is also discussed.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT830394049
http://hdl.handle.net/11536/59072
显示于类别:Thesis