標題: | 藉由對群組關聯式的第二階層快取記憶體作路預測來節省動態讀取耗能 WP-TLB: Way Prediction for Set-Associative L2 Cache to Save Dynamic Read Energy |
作者: | 周資敏 Tzu-Min Chou 單智君 Jyh-Jiun Shann 資訊科學與工程研究所 |
關鍵字: | 低功耗;第二階層快取記憶體;路預測;low power;L2 cache;way prediction |
公開日期: | 2008 |
摘要: | 第二階層快取記憶體 (L2 cache)一般都是設計成群組關聯式快取記憶體 (set-associative cache),且關聯度 (associativity)很高。相較於直接對映快取記憶體 (direct-mapped cache),會花費更多的電耗及存取時間。如果我們在群組關聯式快取記憶體中,可以預先知道需要的資料在哪一個路 (way),在只開那個路之下,耗電及存取時間就可以與一個路同樣大小的直接對映快取記憶體差不多。
在這篇文章中,我們提出了一種針對L2 cache作路預測 (way prediction)的設計。藉由在延伸設計的轉譯後備緩衝區 (Translation Lookaside Buffer, TLB)中,儲存用來存取L2 cache之路索引 (way index),在不失效能的前提之下,達到節省動態耗能的主要目的,另外還能節省L2 cache的平均存取時間。本設計最大的特色是,無論路預測是否正確,皆可只存取L2 cache的一個路,以節省讀取耗能及存取時間。亦即,即使當錯誤的路預測發生時,也不需要再重開其他的路找尋需要的資料。
我們使用CACTI 4.2來評估記憶體元件耗電和存取時間,並修改SimpleScalar 3.0來把我們的設計加進去,然後在SimpleScalar上執行SPEC 2000得到模擬數據。在256KB 16-way L2 cache之中,我們可以節省65%的動態耗電,減少17%的L2快取平均存取時間,而只造成0.6%的靜態耗電增加,且不會有任何效能的流失。 An L2 cache is usually implemented as a set-associative cache, and its associativity is usually high. It is obvious that there are more energy and access latency consumed on a set-associative cache than a direct-mapped cache with the same size. If we can know the way of the required data in advance, under only activating the corresponding way, the energy con-sumption and the access latency will be close to a direct-mapped cache which has the same size as a single way of the L2 cache. In this paper, we proposed a design for the way prediction of L2 cache. By storing way indices in extension designed TLB (we called WP-TLB), under the premise that no perfor-mance is lost, we can achieve the main goal of saving dynamic read energy and the secondary goal of reducing access latency. Most importantly, whether the way prediction is correct or not, the energy and access latency can be saved. This is because that we can guarantee that even when miss prediction of way occurs, the other ways do not need to be probed for searching the required data. We use CACTI 4.2 to estimate energy consumption and access latency of memory com-ponents. Moreover, we run SPEC2000 benchmark in modified SimpleScalar 3.0 simulator. According to the simulation results, in the best case, the dynamic power can be saved about 65% and the average access latency of L2 cache can be reduced 17%. And the static power is just increased about 0.6%. No overall performance will lose under our design. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT009555614 http://hdl.handle.net/11536/39566 |
顯示於類別: | 畢業論文 |