標題: 嵌入式記憶體系統之效能與耗能優化
Performance and Energy Optimization of Embedded Memory Systems
作者: 黃承威
Huang, Chen-Wei
曹孝櫟
Tsao, Shiao-Li
資訊科學與工程研究所
關鍵字: 嵌入式系統;記憶體系統;異質性快取記憶體系統;Embedded Systems;Memory System;Heterogeneous Cache System
公開日期: 2014
摘要: 記憶體系統,隨著異質性計算核心的發展趨勢,也開始走向同時配備有不同存取特性的記憶體元件,以及更為多元且有彈性的記憶體階層。為了妥善發揮出硬體元件的能力,系統軟體設計者必須要更為注意程式執行的需要,設計出有效的位址轉換機制,以及程式執行階段偵測的能力,以達到事前把所需要的程式碼與資料,配備或搬移至最為恰當的記憶體元件。 我們的研究,是一個初步嘗試往這樣異質性記憶體系統優化的系統軟體研究。所研究的硬體架構,是一個同時存在著硬體管理快取以及軟體管理快取的異質性記憶體系統。我們發現,在這樣一個同時存有軟硬體管理快取的系統裏,程式分類管理的優化決策,是與僅有一種軟體或是硬體管理快取系統,有著不同的考量。除了以存取次數為分類依據之外,更重要的是必須考量到在軟硬體快取裏可能帶來的存取失誤。 在軟硬體管理快取並存的系統裏,我們可以把工作集(Working Set)中的程式碼,擺放置軟體管理快取,或是透過位址的調整,來減少快取失誤以致必須到外部記憶體存取的次數。透過軟體快取與硬體快取的搭配使用,我們可以使用比較簡單的硬體管理快取(如Direct-mapped),來達到比僅有一種快取,更佳的省電以及效能表現。 本研究了探討多媒體解碼程式(H.264)在達到即時播放上的計算資源需求。離線地以靜態分析的手法,配合影片內容的結構分析,了解在不同最差情境下的執行時間(Scenario-Based Worst Case Execution Time)。在執行時期,資源排程器即可針對當下動態偵測所得的影片內容結構,施以對應此內容情境,在離線時已分析出的策略,來達成有效系統資源使用,並提供使用者一個平順觀賞的體驗。 針對這類大型嵌入式應用程式所具有多重執行熱點(Multiple Hotspots)的指令存取特性,我們提出能夠依據程式執行時期需要,動態調整記憶體管理的管理策略。我們在管理架構上,提出了改善的設計,包括讓軟體管理快取記憶體能夠使用與主記憶體不同的替換單位大小,以縮小Translation Lookup Buffer (TLB)的失誤次數;抽離出軟體管理快取記憶體頁面管理表格放置在軟體管理快取記憶體內,以減少TLB的失誤處理成本。在結合管理策略以及架構上的改進,我們成功的改善了使用既有技術無法改善多媒體應用程式H.264的瓶頸。
Following the development steps of computing cores, memory systems are also moving toward heterogeneity trend with versatile memory hierarchy. System software design has to have a deeper understanding toward application to excel the hardware components. With the capability of detecting application requirement and efficient address translation mechanism, we are able to allocate or distribute the needed code or data to the most suitable memory resource. Our study is a first step toward the heterogeneous memory system software optimization. We studied a heterogeneous memory architecture with both hardware- and software-controlled SRAM cache coexisted. Traditional solely access-based code selection decision used in a SPM-only system cannot be directly applied to systems with both cache and SPM. In such a system, both code access frequency and cache miss are critical to performance and energy consumption. In such coexisted system, expensive external memory accesses can be reduced by either mapping the code fragments within a working set to the SPM or by properly layout the code fragments to different cache sets. With the help of the SPM, we can achieve a better energy and performance behavior with a simpler cache organization. In this thesis, we investigated the computation requirements of a multimedia codec (H.264) to fulfill its real-time requirement. Assisted with the knowledge of H.264 video structure, we statically analyzed the worst case execution time of the H.264 decoder under different usage scenarios. The resource scheduler can flexibly apply suitable allocation strategy obtained offline according to monitored video content during runtime. This way we can achieve a better resource utilization while still providing our users a pleasant viewing experience. For real and complicated applications such as multimedia which often exhibit multiple execution hotspots, we have effectively extended our static software memory optimization strategy to dynamically respond to the application need. In this study, we propose a new SPM code selection policy based on the severity of causing cache conflict that covers both importance factors. Moreover, codes in large programs often exhibit time-varying nature of importance during different program execution phases. In contrast to previous application-based dynamic SPM optimizations, we devise a phase detection method enabling us to utilize a finer-grained phase-based dynamic management to determine suitable codes in SPM. Results indicate that we are able to reduce the instruction memory energy-delay product by 18.6 %, and 65.1% to that of a fully-cache, and a state-of-the-art approach of a cache and SPM co-existed system, respectively.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079655811
http://hdl.handle.net/11536/125831
Appears in Collections:Thesis