標題: 針對預取快取記憶體之一種混合機制
A Hybrid Mechanism for Prefetching Caches
作者: 李文慧
Wen-Hui Lee
陳昌居
Chang-Jiu Chen
資訊科學與工程研究所
關鍵字: 預取快取記憶體;prefetch;branch-directed
公開日期: 1998
摘要: 由電腦市場的產品, 我們可感覺到中央處理器(CPU)速度的快速成長, 據估計它的效能每年約成長 50 % , 但計憶體每年的成長約只有 5% 到 10 % , 由於 CPU 處理資料的來源為計憶體, 兩者之間的速度差距, 使得CPU 無法完全發揮其效能, 成為整個系統效率提昇的瓶頸之一. 計憶體系統的設計便成為一重要之課題. 計憶體的階層化即為拉近CPU與計憶體的差距, 但仍不足夠, 因此又提出以預取的方法來更進一步改善此差距, 有關於預取的方法,很多的論文都有提出不同的見解及方法,有用軟體方式(Software control prefetch)的預取也有用硬體方式(Hardware control prefetch)的預取, 本論文著重於硬體方式的預取。 在論文中, 我們提出兩個Branch-Directed 的方法, 分別用於 Data Cache 和 Instruction Cache 的預取, 由於目前的分支預測的準確度可達95%以上。也就是可準確的預測下一個指令的記憶體需求, 我們在作完分支預測時, 利用其結果作來判斷是否需預取及預取的位置。 實驗結果顯示其準確度很高, 可避免記憶體頻寬的浪費. 但在整個系統的改善還不夠, 因此我們另外也討論幾種混合的機制, 進一步改善快取記憶體Miss Rate的降低, 這幾種混合的新機制, 模擬結果顯示對於不同的標竿程式提昇不同效能, 我們並試驗不同的設計參數, 觀察其對效能的改進影響, 最後我們根據所有模擬結果評量後得一預取的設計, 其結果顯示平均提昇效能 11.46%, 多於直接加倍快取記憶體的7.26%. 而我們所需之硬體則較少. 我們使用SimpleScalar這套工具來模擬論文中的所有的設計,它是由美國威斯康辛大學麥迪遜分校所發展,為一個execution driven的模擬環境,結構完整且功能十分的強大。模擬的標竿程式則是部分的SPEC95程式。
It has been estimated that the performance of the fastest available microprocessors is increasing at approximately 50% per year, while the speed of memory systems has been growing at only about 5% to 10% per year. Since all the data needed by CPU are provided by memory, the overall performance of CPU is degraded due to latency of memory. Certainly, memory access is a major bottleneck in high-performance computer systems. Therefore, it is very important to improve the performance of memory system. The use of cache reduces speed gap between processor and memory. Moreover, prefetching on caches can further reduce memory latency. About prefetch mechanism, there is many papers covering this subject. Some are software-controlled prefetching, and some are hardware-based prefetching. In this thesis, we pay our attention on hardware prefetching. In this thesis, we will propose two Branch-Directed prefetching techniques, since current advanced branch prediction mechanisms are already part of the architecture. The branch predictors in current microprocessors reduce the stall time due to instruction fetching and, in general, can achieve prediction accuracy as high as 95% for SPEC benchmarks. Based on the accurate outcome of branch predictor, we can predict the next memory reference. The simulation results show that our branch-directed prefetcher has high accuracy, but low coverage. So, we propose three hybrid mechanisms to improve the overall performance. According to simulation results, we obtain a good prefetch design showing that the average improvement of execution cycle is 11.46%, which is better than the average improvement of 7.26% by simply doubling the cache size. Moreover, our design has lower hardware cost. We simulate our design by using the SimpleScalar tool set. It is an execution-driven simulator. This tool is developed by University of Wisconsin, Madison. The benchmark programs are subset of SPEC95.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT870392011
http://hdl.handle.net/11536/64032
顯示於類別:畢業論文