時間落差消除以實現低電壓處理器中高能源效率之晶片內建記憶體

標題:	時間落差消除以實現低電壓處理器中高能源效率之晶片內建記憶體 Reducing Timing Discrepancy for Energy-Efficient On-Chip Memory Architectures with Low Voltage Processors
作者:	王柏皓陳添福 Wang, Po-Hao Chen, Tien-Fu 資訊科學與工程研究所
關鍵字:	低電壓處理器;錯誤容忍快取記憶體;降低時序差異;Low-voltage processor;Fault-tolerant cache;Timing discrepancy reducing
公開日期:	2017
摘要:	動態電壓頻率調變(DVFS)是一個現代處理器系統中減少能量(energy)消耗的有效方法。然而，隨著電壓降低，晶片內建記憶體延遲時間的成長通常較處理器來的快，意味著介於晶片內建記憶體和處理器間的延遲差異將會隨著電壓降低而加大。這導致了晶片內建記憶體和處理器核心之間的時間落差，並導致了系統效能的下降。此時間落差主要來自於其內部之靜態存取記憶體(SRAM)中的少數記憶體單元受到嚴重的製程變異所影響，並且存取時發生存取時間不足的錯誤。幸運的是，這些錯誤可以透過提供足夠的存取時間來補救。過去大部分的容錯設計通常會犧牲記憶體的容量或是增加存取等待時間，因此這些方法並不適用於像是L1高速快取記憶體或是本地記憶體(local memory)等等對存取時間要高度要求的記憶體。而隨著更低的操作電壓和更小的製程技術被引入處理器系統，這些導致存取時間不足的慢速記憶體細胞數量也會隨之增加。因此，要如何容忍大量存取時間錯誤並避免大量的效能代價，將成為處理器系統中，一個關鍵的問題。為了解決容忍大量存取時間錯誤的問題，在本文中，我們分析了在現代低壓處理器系統中常用的靜態隨機存取記憶體(SRAM)的特性。然後基於這些觀察，針對不同的目的性，提出三種用於晶片內建記憶體的存取時間錯誤容錯技術。第一個設計是基於8T SRAM的零計數錯誤檢測碼（ZC-EDC），此設計可適用於不同記憶體架構設計，例如快取記憶體，本地記憶體或轉譯後備緩衝區（TLB）。為了達到適用於不同記憶體架構設計的目標，存取時間錯誤容錯設計不容許有記憶體空間上的損失。ZC-EDC使用輕量級錯誤檢測碼（'0'計數）動態地檢測存取時間錯誤，這是因為存取時間錯誤僅會發生在8T SRAM讀取資料“0”之時，發現錯誤後再調整存取時間以容忍存取時間錯誤。此外，為了進一步提高L1快取記憶體的平均存取時間，我們分析了快取記憶體上的局部效應(locality effect)，並提出了一種對時間感知的LRU策略，以便將常用的資料儲存在較快速的記憶體區塊上。第二種設計是交叉比對高速快取記憶體(CM-cache)，此設計著重於增加使用8T SRAM的L1快取記憶體的存取時間錯誤容忍能力。CM-cache首先基於8T SRAM的特性提出具動態時間校準功能之靜態隨機存取記憶體（DTC-SRAM），以在處理器運行時檢測各快取記憶體區塊所需之存取時間。接著，針對DTC-SRAM，我們針對不同的錯誤容忍程度提出不同的快取記憶體管理策略，這其中包括了位元級的存取時間錯誤遮罩。該設計可以在系統運行時檢測儲存值的影響，並調整所需的存取時間。第三種設計是一種用於L1快取記憶體的Ally cache。透過在多個快取記憶體區塊中存儲相同的數據，並激發對應的字線(wordline)，可以達到“聯合”(ally)高速快取記憶體區塊之效果。在記憶體區塊聯合之後，可以有效的提高快取記憶體的存取速度、實現位元級的存取時間容錯，並提供可靠的低電壓操作。與上述提出的設計不同，Ally cache並沒有利用8T SRAM的特性，因此可以應用於6T、8T和類似的SRAM上。然而，此方法將帶來了大量的容量損失和存取上的能量消耗。故我們提出了針對L1快取記憶體之資料聯合管理策略來減少Ally cache中不必要的能量開銷。 Dynamic Voltage Frequency Scaling (DVFS) is an effective method for saving energy in modern processor systems. Nevertheless, on-chip memory usually exhibits worse latency degradation than do processor cores in low-operating-voltage modes. This causes a timing discrepancy between on-chip memory and cores that degrades the system performance. The timing discrepancy is primarily caused by severe process variations in slow memory cells and produces access-time faults. Fortunately, these faults can be remedied by providing sufficient access time. Previous fault-tolerant designs usually sacrifice the capacity or increase the access latency to tolerate access-time faults, so these methods are not suitable for the latency-sensitive memories such as level 1 (L1) caches. Besides, the number of slow cells is increased by aggressive voltage decreases and technology node advancement. Therefore, tolerating numerous access-time faults without large latency overhead to reduce the timing discrepancy will become a critical issue gradually. To address the issue of tolerating numerous access-time faults, in this dissertation, we analyze the characteristic of static random-access memory (SRAM) that is commonly used in modern processor systems. Base on the observation, three access-time-fault tolerance technologies are proposed for on-chip memories with different purposes in this dissertation. The first design is Zero Counting Error Detection Code (ZC-EDC) that is designed for different memory architectures such as caches, local memories or translation lookaside buffers (TLB) on 8T SRAM. To achieve the target, the proposed access-time-fault tolerance design must have no capacity loss. ZC-EDC use a light-weight error detection code (‘0’ counting) to detect access-time faults dynamically because access-time faults occur only when reading ‘0’ bits on the 8T SRAM, then adapts the access time to tolerate the access-time faults. Besides, to further improve the average memory access time of L1 caches, we analyze the locality effect and propose timing-aware LRU policy to dynamically place hot data on the fast blocks. The second design is a Cross-Matching cache (CM-cache) that focuses on providing high access-time-fault tolerent ability of L1 caches with 8T SRAM. This design first proposes a Dynamic Timing Calibration 8T SRAM (DTC-SRAM) that dynamically calibrates the read latency of each cache line. Then, we propose three different cache strategies for dealing for different usages which includes a bit-level access-time-fault mask. These designs can detect the influence of the stored value at runtime and adaptively adjusts the access time of the L1 cache with 8T SRAM. The third design is an Ally cache which is able to “ally” cache lines by storing the same data and triggering multiple corresponding wordlines to achieve cell-level access-time-faults tolerance and perform reliable low-voltage operation. Different from above proposed designs, Ally cache does not utilize the characteristic of 8T SRAM so it can be applied with 6T, 8T and similar SRAM. However, data ally brings large capacity loss and access overhead. We propose a cache management strategy to reduce unnecessary overhead of Ally cache.
URI:	http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070086017 http://hdl.handle.net/11536/140688
Appears in Collections:	Thesis