標題: PMP記憶體一致性模式
PMP Memory Consistency Models
作者: 伍朝欽
Wu, Chao-Chin
陳正
Cheng Chen
資訊科學與工程研究所
關鍵字: 記憶體一致性模式;多處理機;多引線;快取記憶體一致性協定;同步;效能評估;Memory Consistency Model;Multiprocessor;Multithread;Cache coherence protocol;Synchronization;Performance evaluation
公開日期: 1997
摘要: PMP處理機的特性是多條平行引線共同使用同一個快取記憶體;而所謂 PMP-MP系統就是多處理機系統中的所有處理單元都是PMP架構。本論文的 研究主題就是研究在PMP-MP架構中記憶體一致性模式之設計。首先,為了 評估我們設計的有效性,我們建立了一套軟體評估平台稱之為SEESMA。此 平台具備多種模擬功能,因此提供了對共享記憶體多處理機架構之系統設 計有興趣者一個良好的研究環境。另外,由於記憶體一致性模式與快取記 憶體一致性協定共同規範資料的一致性,所以我們也利用了SEESMA研究何 種快取記憶體一致性協定比較適合PMP-MP架構。根據模擬的結果發現 clean一致性協定對絕大多數的測試程式都能提供最佳的執行效能;如果 再加上寫入快取記憶體,則對所有的測試程式,clean一致性協定都能很 平均地改善系統效能。在第二個研究主題,我們提出了一個新的記憶體一 致性模式稱之為PSC模式。此模式的提出主要是特別考慮了PMP架構的特性 ,並且觀察到barrier和critical section對記憶體存取的順序限制有不 同的要求。另外,由於寫入快取記憶體的優異性,我們設計了一個特殊的 雙寫入快取記憶體,以建立一個PSC的硬體架構。根據評估結果,PSC最好 比釋放一致性模式的執行效能快約11%。由於發現barrier是由一個或多個 的critical section所組成,因此在第三個研究主題中,我們研究如何只 利用critical section的語意來提出另一個新的記憶體一致性模式。此模 式稱之為GC,它利用序列式lock的觀念來建立硬體系統。根據評估結果, GC模式最好比釋放一致性模式的執行效能快約25%。PSC與GC模式允許更多 的記憶體存取平行處理的機會。但是如果快取記憶體的頻寬不夠,則無法 充分利用這些新開發出來的平行度。由於multi-bank快取記憶體常被採用 來提供高頻寬的存取機制,因此在最後一個議題,我們提出一個減少或甚 至全部刪除bank conflict的技術以提高整體的系統效能 The feature of Parallel-Multithreaded Processors (PMPs) is that several parallel threads share only one cache hierarchy. A PMP- MP is a multiprocessor system with processing elements of PMP architecture. In this dissertation, we focus on the design of memory consistency models especially for PMP-MP systems.First, we construct an evaluation platform called SEESMA to verify the effective-ness of our following designs. SEESMA provides a good research environment for tho-se who are interested in system designs for shared-memory multiprocessor architectures because it has versatile simulation functions. Because memory consistency model and cache coherence protocol together enforce data consistency, we also use SEESMA to study what kind of cache coherence protocols is suitable for PMP-MP systems. The simulation resSecond, we propose a memory consistency model called PMP-MP Specific Con-sistency (PSC) model to boost the system performance. The PSC model considers the different requirements on ordering restrictions of memory access for barrier synchroni-zation and critical section according to their individual semantics. To implement this model, each processing element requires a dual write-cache and three counters. The PSC model is better than the release consistency model up to 11%.Third, though the superiority of the PSC model, we find that barrier synchroniza-tion is comprised of one or more critical sections. Therefore, we propose another memo-ry consistency model called Grouping Consistency (GC) model that considers only the semantic of critical section. We use the notion of queue- based lock to implement a GC system. The GC model is superior to the release consistency model up to 25%.Finally, the PSC and GC models allow more pipelining and buffering for memory accesses. However, if cache bandwidth is insufficient, then the new developed parallel-sim cannot be fully utilized. A multi-bank cache is usually adopted to provide more op-portunity for parallel memory accesses. In the last research topic, we propose a tech-nique to reduce or even eliminate cache bank conflicts to prevent from performance degradation.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT860392008
http://hdl.handle.net/11536/62736
顯示於類別:畢業論文