標題: 圖形著色器之暫存器配置技術
Register Allocation Techniques for GPU Shader Processors
作者: 陳思捷
Chen, Szu-Chieh
游逸平
You, Yi-Ping
資訊科學與工程研究所
關鍵字: 暫存器配置;shader處理器;暫存器包裝;register allocation;shader processors;register packing
公開日期: 2011
摘要: 近年來圖形處理器廣泛的使用在嵌入式系統上,協助處理圖形的運算或是通用運算。然而嵌入式系統中硬體資源有限,必須加強管理其硬體資源來提升效能以及降低耗電量。通常在嵌入式的圖形處理器中暫存器的數量也會受到限制,假如實體的暫存器數量不充足以及暫存器配置技術設計不良則容易導致程式有高暫存器壓力的情形。另外,在圖形處理器中的子核心shader上,一個暫存器可分為四個元素,而每一個元素可被單獨存取,若將一個沒有使用到所有元素的變數配置給一個暫存器,則會造成暫存器的空間浪費。在此篇論文中我們提出一個暫存器配置的技術,用以改善圖形處理器中shader架構的暫存器使用效率以及減少暫存器溢出。其方法主要是在配置暫存器時,考慮到變數對於元素的需求量,再配置適當數量的元素給變數。另外,我們提出的方法可以重新配置暫存器中的元素,製造出連續空的元素,讓更多的變數有暫存器配置。實驗結果顯示我們提出的暫存器配置技術可有效的減少平均19%的暫存器使用量以及92%的暫存器溢出,並且在使用電源管理的機制下,能夠減少shader處理器更多的耗電量。
Graphics processing units (GPUs) have been widely used in embedded systems for manipulating computer graphics and even for general-purpose computation. However, many embedded systems have to manage limited hardware resources to achieve high performance or energy efficiency. The number of registers is one of the common limiting factors in an embedded GPU design. Programs that run with limited registers may suffer from high register pressure if register allocation is not properly designed, especially on a GPU in which a register is divided into four elements and each element can be accessed separately because allocating a register to a vector-typed variable that does not contain values in all elements creates a waste of register spaces. In this thesis we present a vector-aware register allocation framework to improve register utilization on shader architectures, thereby reducing register spills. The framework involves two major components: (1) element-based register allocation that allocates registers upon the element requirements of variables and (2) register packing that rearranges elements of registers in order to make more contiguous free elements for keeping more live variables in registers. Experiments demonstrated that the proposed register allocation decreases a mean of 92% of register spills and results in more reduction in energy consumption (from 1.7% to 4.7%) than a previous work that applied a power-control mechanism to the output buffer, where a register is spilled to.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079955534
http://hdl.handle.net/11536/50450
Appears in Collections:Thesis