使用分支目標暫存器協助的可存放分支及副程式的迴圈緩衝器以減少指令擷取能源消耗

標題:	使用分支目標暫存器協助的可存放分支及副程式的迴圈緩衝器以減少指令擷取能源消耗 Power Reduction in Instruction Fetch Using Forward-Branch and Subroutine Bufferable Innermost Loop Buffer with Assistance of BTB
作者:	田濱華 Bin-Hua Tein 鍾崇斌 Chung-Ping Chung 資訊科學與工程研究所
關鍵字:	低功耗;迴圈緩衝器;指令擷取能源消耗;low power;loop buffer;instruction fetch power
公開日期:	2005
摘要:	減少嵌入式處理器的能源消耗以增加使用時間變得日益重要。在嵌入式處理器中，指令擷取的能源消耗佔整個動態能源消耗很大一部分。近來提出一減少指令擷取能源消耗之設計，在指令快取及處理器間加入一更小的記憶體，利用指令的時間區域性，使得大部分的指令能從此小的記憶體中擷取。迴圈有很好的空間區域性，因此許多的迴圈緩衝器設計便被提出。然而，在設計複雜度的限制下，大多數的迴圈暫存器只存放迴圏內無向前分支指令及沒有副程式的最內層迴圈，或是存放迴圈內的指令直到向前分支指令及副程式的最內層迴圈。但許多的迴圈內包含向前分支及副程式，因此現有的設計仍有減少指令擷取能源的改善空間。我們提出一個簡單且有效率的方法，來存放迴圈內含向前分支及副程式的最內層迴圈：因為分支目的緩衝器在現今的嵌入式處理器設計中將普遍存在，如果我們加入一額外欄位在每項分支目的緩衝器內容後，用來指示這個在迴圈緩衝器存放的分支後的下一到指令是分支後接續的下一道或是分支目的，如此可避免掉設計複雜度而存放內含向前分支的最內層迴圈。含副程式的最內層迴圈且這個副程式內不包含迴圈也可用相似的方法處理。使用MiBench模擬的結果得到，我們的設計可比先前無法存放向前分支及副程式的迴圈緩衝器更減少13.66%的指令擷取能源消耗。 Reducing power of embedded processors is becoming increasingly important for mobile applications. Much of the dynamic power of a typical embedded processor is consumed by instruction fetching. Recently, addition one tiny memory between CPU core and instruction cache had been proposed. Using the temporal locality of instructions, most of instructions can be fetched from this tiny memory to replace from instruction cache. Loops have temporal locality, so that many of loop buffer design had been proposed. Nevertheless, on design complexity dictates most loop buffer designs to store only innermost loops without forward branch or instructions within innermost loops before a forward branch. While program modeling shows that typical programs can best be represented with a simple loop model, many of them contain forward branches and subroutines in their innermost loops. Hence, existing designs lead to limitation in reduction of instruction fetch power. We propose a simple and effective way to cope with this complexity: since using BTB is a norm in most designs, if we add an extra bit in BTB, indicating if the loop buffer stores the fall-through or target trace after a within-the-innermost-loop forward branch, then much of the complexity can be avoided. The subroutine including no loop is also handled by using similar way. Results with MiBench indicate that up to 18% of further reduction in instruction fetch power compared with the design without forward branch and subroutine handling.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT009323587 http://hdl.handle.net/11536/79116
Appears in Collections:	Thesis

Files in This Item:

358701.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.