標題: | 在數位訊號處理器架構下有效指令排程法之研究 A Study on Effective Instruction Scheduling Methods for DSP Architecture |
作者: | 李宜軒 Yi-Hsuan Lee 陳正 Cheng Chen 資訊科學與工程研究所 |
關鍵字: | 數位訊號處理器;指令排程;低功率消耗;變數分割;Digital Signal Processor;Instruction Scheduling;Low Power Consumption;Variable Partition |
公開日期: | 2006 |
摘要: | 隨著多媒體通訊日益劇增的需求,陸續發展出許多關於科學計算及數位訊號處理的方法。這些應用程式以規則相依迴圈為主,計算複雜度很高,大部分時間都是執行ALU運算指令。數位訊號處理器 (digital signal processor, DSP) 是一種為特殊目的設計的微處理器,通常包含多個獨立的資料記憶體模組 (multiple data memory banks),並採用heterogeneous register sets方式;而要有效利用這些架構特性,顯然需要充分的編譯技術支援。為了提高數位訊號處理應用程式的執行效能,在編譯過程中必須開發迴圈之間潛在的平行度,並盡量減少額外指令 (spill codes) 的產生。同時,由於攜帶型電子裝置的逐漸普及,功率消耗也成為另一個重要的設計議題;若是能從高階合成 (high-level synthesis) 的觀點來考慮功率消耗,通常能以較低的代價 (cost),來有效降低功率消耗。
在本論文中我們將針對包含多重資料記憶體模組的數位訊號處理器以及規則相依迴圈,設計有效的指令排程法。對於這種系統架構,完整的編譯過程必須涵蓋多個步驟,由於這些步驟彼此之間有複雜的資料相依性,同時考慮多個步驟將有助於得到較佳的排程結果。本論文主要分為三個研究議題,首先我們設計三個簡單的變數分割機制 (variable partition mechanism),以及三個對應的指令排程法rotation scheduling with unfolding (RSF)、rotation scheduling with tiling (RST) 和rotation scheduling with parallelization (RSP),不考慮暫存器 (accumulator/register) 的配置。第二個研究議題我們先針對Motorola DSP56000這顆數位訊號處理器的架構特性,提出指令排程法rotation scheduling with spill codes predicting (RSSP),涵蓋編譯過程的所有步驟。RSSP的特色是在實際排程指令之前事先預測暫存器滿溢 (accumulator spill) 發生的時機,並產生對應的spill codes。接著我們提出指令排程法rotation scheduling with spill codes avoiding (RSSA),它是RSSP的延伸,可以適用於多種特性類似的架構。RSSA同時將縮短排程長度和減少spill codes列為排程目標,也使用其他的機制解決accumulator spill,不再預測其發生的時機。除此之外,我們定義一個虛擬架構模組 (hypothetical machine model),配合RSSA深入探討不同系統資源數量改變時對排程結果造成的影響。最後在第三個研究議題中我們進一步延伸RSSA,利用運算元分享 (operand sharing) 的方式,提出二個低功耗指令排程法rotation scheduling with operand reutilization (RSOR) 和 rotation scheduling with exploiting operand reutilization (RSER)。RSOR只在單一迴圈元素 (iteration) 內令運算元重複使用。RSER則是設計一個迴圈轉換 (loop transformation) 機制,尋找在不同迴圈元素內指令共用運算元的情形,以增加運算元重複使用的機會。
在描述所有提出方法的特性之後,我們選擇數個數位訊號處理的應用程式來評估執行效能,分別使用排程長度、指令個數以及運算元重複使用次數等三個標準。另外我們也定義數學模組,用來計算迴圈轉換之後整體排程長度和運算元重複使用次數。初步評估,所有提出的指令排程法都能達到預期的效能。 As the multimedia and communication flourishing, many scientific and digital signal processing applications are developed. These applications are iterative and data-dominated, which are usually represented by uniform loops and characterized by a predominance of arithmetic instructions. A digital signal processor (DSP) is a special-purpose microprocessor designed to achieve high performance in digital signal processing applications, and commonly employs architecture with irregular data paths, multiple data memory banks, and heterogeneous register sets. Sufficient compiler support is apparently important to harvest benefits of this architecture. To optimize the throughput of such digital signal processing applications, we need to explore the embedded parallelism of a loop and generate fewer spill codes. As the portable system market grows rapidly, power becomes another critical constraint in the design specification. If we consider low power design at high-level synthesis, we can obtain much more effective power reduction with less cost and effort. In this thesis we will focus on designing code generation methods to schedule uniform loops on DSP architecture with multiple data memory banks. The complete code generation process for this architecture must include several phases, and to consider more phases at the same time may lead more effective results due to their extremely data dependent. Our research contains three main issues. For this first issue we design three efficient variable partition mechanisms and three corresponded methods rotation scheduling with unfolding (RSF), rotation scheduling with tiling (RST), and rotation scheduling with parallelization (RSP) without considering the accumulator/register assignment. In the second issue we first present method rotation scheduling with spill codes predicting (RSSP) focus on Motorola DSP5600 covering all code generation phases. The main feature of RSSP is to predict the occurrence of accumulator spills, and generate corresponding spill codes in advance. After that, we generalize RSSP to rotation scheduling with spill codes avoiding (RSSA), which can suit various DSPs with similar architectural features. The scheduling goal of RSSA is to achieve both shorter schedule length and fewer spill codes. Meanwhile, new mechanisms are designed for resolving accumulator spills instead of predicting their occurrences. Besides, we also evaluate RSSA on a defined hypothetical machine model, and deep study the influence of differing number of resources on the scheduling result. Finally, two energy-efficient code generation methods rotation scheduling with operand reutilization (RSOR) and rotation scheduling with exploiting operand reutilization (RSER) are proposed in our third issue. These two methods are extended from RSSA to further consider the operand sharing technique. In RSOR only the potential operand sharing within an original iteration is considered. In RSER we design a novel loop transformation mechanism to reconstruct the given loop, to find instructions with common operands hidden in different original iterations. In addition to present detailed principles of all proposed methods, we select some MDFGs to evaluate their performances based on metrics schedule length, instruction count, and the number of operand reutilizations. We also design analytic models for every proposed method, which can calculate the overall scheduling length and the number of operand reutilizations of a reconstructed loop. Preliminary evaluations show that all proposed methods can achieve desirable results. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT008817519 http://hdl.handle.net/11536/60779 |
Appears in Collections: | Thesis |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.