標題: 超純量處理之研究
Study of superscalar processing
作者: 蕭裕弘
XIAO, YU-HONG
鍾崇斌
資訊科學與工程研究所
關鍵字: 超純量處理
公開日期: 1992
摘要: 本論文研究超純量指令排程的問題,因指令排程好壞對超純量處理機之效能有決定 性之影響。相對其重要性,超純量指令排程的研究目前仍感缺乏,或基於不合實際 條件的假設,或僅適合某一特殊架構。在本論文中,吾人研究三個相關問題,分別 為:(1)現存排程演算法則之分析;(2)最佳排程方法之設計;(3)有效的 啟發式排程演算法則的設計。 首先吾人設計兩個抽象模型,一個為超純量處理機模型,另一個為程式模型。這兩 個模型能反映大部份超純量指令排程問題所具有的特性及統一研究工作。 吾人首先分析排程演算法則本身。大部份現存排程演算法則之評估是採取測試程式 方式。但此舉費時且受樣本數目之限制或代表性樣本選擇之困難,此法並不能有效 發掘程式之效能。為此,吾人以最壞情況分析的方法來評估排程演算法則。吾人發 現,現存排程演算法則皆為貪心排程方式,且最佳排程亦可轉換成貪心排程。因此 ,吾人分析任意一個貪心排程的最壞情況。令w 表示一個貪心排程的長度, greedy 而w 表示一個最佳排程的長度。吾人研究 w /w 的最大值。吾人導 opt greedy opt 出w /w <k+1-1/max{m }(z+1) ,其中k為處理機型式之數目,m 為 greedy opt ▔ i i 第i種處理機之數目,而z為指令執行所產生最大的延遲週期。此成果具廣泛性能 涵蓋大部份此類問題,且當問題模型簡化後,此成果涵蓋前人在單位時間工作排程 研究之成果。 然而所導出w /w 之比值令人失望,當k=4 (如IBMRS/6000)時,此比值接 greedy opt 近5。因此需要一較好之演算法則。在此,吾人研究最佳排程的演算法則,因並無 令人滿意之現存方法。吾人相信如能有效利用程式中的一些特性,達成最佳排程法 並不困難。經吾人徹底研究後,定義了指令間及部份排程間的優勢及相等關係。這 些關係可用來大幅縮小答案空間。吾人亦實際實驗此方法,其結果顯示此方法之優 越性。 為實際應用起見,一個有效的啟發式演算法則仍有存在之價值。因此,吾人設計了 一個排程演算法則,稱為ECG-演算法則。相對於一般關鍵路徑法,此演算法則在指 定指令排程優先權時作較多的考慮。吾人以最壞情況分析的方法來證明ECG-演算法 則之優越性。令w 表示一個ECG-排程的長度。吾人証明w /w <k+1-2/ ECG ECG opt ▔ max{m }(z+1) 。此值為目前已之者中最小者。 i 本研究成果令人滿意與鼓舞。尤有甚者,此問題之模式、解題方法及分析技術皆具 廣泛性,能適用於大部份此類問題。
In this dissertation, to eliminate or reduce the penalties caused by the possible hurdles, we study the following topics: 1) modeling of superscalar processing; 2) performance improvements provided by various hardware enhancements; 3) performance improvements provided by various techniques of software instruction scheduling; 4) design of a new live range definition and a new register allocation algorithm especially for superscalar processors. At the beginning of this dissertation, three superscalar architectural models, the XPCB, XXPB, and X4P2 models, are first proposed and used as target machine models in this study. Performance evaluations are made to determine which of these three architectural models is the most cost- effective. Considering cost-effectiveness and to avoid excessive blocking due to flow dependencies, simulation results show that the XXPB model is a judicious choice in the design of a superscalar processor system. Register renaming, operand copying, out-of-order instruction issuing, and branch prediction are four hardware methods for enhancing the performance of a superscalar processor. We study the improvement in execution time provided by these four enhancements in superscalar processing. The performance of various software instruction scheduling techniques are also studied in this dissertation. Finally, since the live range definition for a symbolic register and the corresponding register allocation algorithm affect the instruction parallelism exploitable by the architectures, these issues must also be studied. In this dissertation, the effects of hardware instruction scheduling, software instruction scheduling, and several related topics on performance improvement for superscalar processing are studied. We hope the information presented in this research can contribute to the design of future high- performance superscalar systems.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT812392001
http://hdl.handle.net/11536/57218
Appears in Collections:Thesis