標題: 叢集系統上降低記憶體存取延遲方法的探討及模擬評估環境之研製
A Study on Reducing Memory Access Latency for Cluster Multiprocessor System Design and Implementation of Its Simulation and Evaluation Environment
作者: 巫嘉榮
Wu, Jia-Rong
陳正
Cheng Chen
資訊科學與工程研究所
關鍵字: 叢集;多處理機;快取記憶體一致性協定;有效的區塊取代機制;讀取廣播;遷移性分享;Cluster;Multiprocessor;Cache coherence protocol;Effective block replacement mechanism;Read broadcasting;Migratory sharing
公開日期: 1997
摘要: 由於電腦應用程式所需的計算量日益增加及Easy-programming的原因 ,使得共享記憶體多處理機系統架構成為未來電腦系統發展的趨勢。並且 因為溝通的區域性,以致於叢集系統成為十分重要的發展方向。我們將設 計一個叢集共享記憶體多處機系統(本論文負責叢集內部的設計),來探討 在叢集系統上降低記憶體存取延遲的方法。它是由MINT所提供的記憶體參 考器與我們所設計的記憶體子系統模擬器所組成。另外,我們所設計的記 憶體子系統模擬器是由SEESMA演變而來,它包含兩層式快取記憶體、區域 匯流排、叢集間快取記憶體、兩層的快取記憶體一致性協定以及連結網路 的模擬。此項模擬環境,經多組標竿程式測試後,己能正確有效地執行並 評估系統架構上多項議題。此一環境之建立,將提供教學及學術研究上良 好的模擬平台;同時,亦可與平行編譯技術之探討做緊密結合。 在本 論文中,我們將以此叢集多處理機模擬評估環境為依據,來探討有效的區 塊取代方式、讀取廣播以及遷移性分享等降低記憶體存取延遲的重要設計 。經過評估之後,我們得到了三個重要的結論。(1)當區塊取代情形嚴重 時,一個有效的區塊取代機制可以改善整個叢集系統的效能將近10%。(2) 經由讀取廣播的方法,可以降低read miss的次數。而且,當整個系統的 處理器個數固定時,叢集大小(size)越大,其所減少的read miss次數會 越多(3) 降低遷移性分享存取延遲對叢集系統的改善沒有非叢集系統來的 大,經過我們的分析發現主要是因為叢集系統上溝通區域性的特性,使得 叢集系統較非叢集系統而言,會產生較短的Acquire stall time。但在降 低遷移性分享存取延遲對於叢集或非叢集多處理機系統而言,都可以提升 相當多的效能。我們所得到的這些結果,將提供給叢集系統設計者做為重 要的參考。 Due to acceleration of computing and easy-programing,shared- memory multi- processor system (clustered MP) plays an important role in computer system development.Henceforth,we had developed a simulation and evaluation enviromentis a program-driven simulator,it consists of a memory reference generator supported by MINT and a memory subsystem simulator that we design.The memory subsystem simulator that we design. The memory subsystem simulator is derived from SEESMA which provides various machine spec.Simulations including two- level cache,local bus,inter-cluster cache,cache coherence protocols and interconnection network.Following it's well construction,our simulation enviroment will truly become a useful platform for research and educational purposes. By using this simulation and evaluation enviroment,we had studied three mechanisms which tend to reduce memory access latency in clustered MP, including effective block replacement,read broadcasting and migratory sharing Based on the evaluation results,we conclude that,(i)while block replacement isserious,Effective Block Replacement mechanism gets 10% performance gains; (ii)read broadcasting enables reduction of read miss count,when the number of processors is fixed,read miss counts will be sharply reduced for larger cluster size;and (iii)reducing migratory sharing access latency on clustered system does not bring put apparent performance improvement as in non-clusteredsystem,because of it's shorter acquire stall time.These results will creditably become important reference for clustered MP designers.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT860392019
http://hdl.handle.net/11536/62748
Appears in Collections:Thesis