系統晶片中快取記憶體預取技術及排線橋設計

標題:	系統晶片中快取記憶體預取技術及排線橋設計 Cache Prefetch Techniques and Bus Bridge Design in SOC
作者:	張彥中 Nelson Yen-Chung Chang 任建葳 Chein-Wei Jen 電子研究所
關鍵字:	快取記憶體;預取;橋;系統排線;時距;Cache;Prefetching;Bridge;System Bus;Time Stride
公開日期:	2001
摘要:	快取記憶體預取長久以來就被用來減少快取記憶體中的失誤率，及減少處理器所感受到的記憶體存取延遲。快取記憶體預取因此可以達到讓較小的快取記憶體有等同較大容量快取記憶體一般的表現，進而減少快取記憶體大小。然而雖然快取記憶體預取可以減少快取失誤率，但其預取的記憶體存取需求卻會增加整體系統排線上的資料流量。而這增加的排線資料流量在使用共用系統排線架構的箝入式系統，卻會造成系統排線上的壅塞。如此雖有可能藉快取預取降低失誤率，但整體系統的表現卻因資料無法順利藉排線傳輸而降低。本論文根據時脈準確的模擬，分析系統中重要參數對快取預取和整體效能的影響，提出一套運用存取時間資訊的快取預取技巧及一個可對存取需求做排序的排線橋，以解決快取預取在排線壅塞所遇到的問題和影響。根據模擬的結果，本論文提出的快取預取技巧配合可排序排線橋，在預設的參數和背景下，平均可較無快取預取系統在平均資料讀取時間減低8.8%，並降低90%的快取失誤率。 Cache prefetching has long been known in reducing cache miss rate, and in hiding memory access latencies seen by the processor in a processor-based system. This provides a chance to implement a smaller cache with prefetch mechanism to achieve same miss rate with larger cache without prefetching, hence reducing the cache hardware cost. Though reducing the miss rate improves the performance of a cache, the extra prefetch memory requests increases the overall system bus traffics. The increased bus traffic sometimes diminishes the overall performance of a system, even with the reduced miss rate. In an embedded SOC system, there are more devices that access through the shared system bus. Therefore the heavy traffic of the system bus will limit the benefits of applying cache prefetching techniques to an embedded system. Since the hardware prefetching approach takes the advantage of run-time information, and can take the system bus status into consideration, it is more suitable for embedded systems with multiple master devices. In this thesis, we investigate the characteristics of several hardware cache prefetching techniques. Then we proposed a new cache prefetching named reference time stride prefetch (RTSP) scheme incorporating access timing information, and a system bus bridge design with access reordering for the processor to solve the bus congestion problem. The effect of each relevant parameters and how the prefetching affects an embedded system are revealed by running cycle-by-cycle trace-driven simulations of an embedded system model with an ARM7TDMI core and AHB system bus. The simulation result shows that RTSP can reduce 8.8% of average data reference time and more than 90% of data miss rate compared with an unprefetched system.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT900428098 http://hdl.handle.net/11536/68788
顯示於類別：	畢業論文