标题: | 嵌入式记忆体系统之效能与耗能优化 Performance and Energy Optimization of Embedded Memory Systems |
作者: | 黄承威 Huang, Chen-Wei 曹孝栎 Tsao, Shiao-Li 资讯科学与工程研究所 |
关键字: | 嵌入式系统;记忆体系统;异质性快取记忆体系统;Embedded Systems;Memory System;Heterogeneous Cache System |
公开日期: | 2014 |
摘要: | 记忆体系统,随着异质性计算核心的发展趋势,也开始走向同时配备有不同存取特性的记忆体元件,以及更为多元且有弹性的记忆体阶层。为了妥善发挥出硬体元件的能力,系统软体设计者必须要更为注意程式执行的需要,设计出有效的位址转换机制,以及程式执行阶段侦测的能力,以达到事前把所需要的程式码与资料,配备或搬移至最为恰当的记忆体元件。 我们的研究,是一个初步尝试往这样异质性记忆体系统优化的系统软体研究。所研究的硬体架构,是一个同时存在着硬体管理快取以及软体管理快取的异质性记忆体系统。我们发现,在这样一个同时存有软硬体管理快取的系统里,程式分类管理的优化决策,是与仅有一种软体或是硬体管理快取系统,有着不同的考量。除了以存取次数为分类依据之外,更重要的是必须考量到在软硬体快取里可能带来的存取失误。 在软硬体管理快取并存的系统里,我们可以把工作集(Working Set)中的程式码,摆放置软体管理快取,或是透过位址的调整,来减少快取失误以致必须到外部记忆体存取的次数。透过软体快取与硬体快取的搭配使用,我们可以使用比较简单的硬体管理快取(如Direct-mapped),来达到比仅有一种快取,更佳的省电以及效能表现。 本研究了探讨多媒体解码程式(H.264)在达到即时播放上的计算资源需求。离线地以静态分析的手法,配合影片内容的结构分析,了解在不同最差情境下的执行时间(Scenario-Based Worst Case Execution Time)。在执行时期,资源排程器即可针对当下动态侦测所得的影片内容结构,施以对应此内容情境,在离线时已分析出的策略,来达成有效系统资源使用,并提供使用者一个平顺观赏的体验。 针对这类大型嵌入式应用程式所具有多重执行热点(Multiple Hotspots)的指令存取特性,我们提出能够依据程式执行时期需要,动态调整记忆体管理的管理策略。我们在管理架构上,提出了改善的设计,包括让软体管理快取记忆体能够使用与主记忆体不同的替换单位大小,以缩小Translation Lookup Buffer (TLB)的失误次数;抽离出软体管理快取记忆体页面管理表格放置在软体管理快取记忆体内,以减少TLB的失误处理成本。在结合管理策略以及架构上的改进,我们成功的改善了使用既有技术无法改善多媒体应用程式H.264的瓶颈。 Following the development steps of computing cores, memory systems are also moving toward heterogeneity trend with versatile memory hierarchy. System software design has to have a deeper understanding toward application to excel the hardware components. With the capability of detecting application requirement and efficient address translation mechanism, we are able to allocate or distribute the needed code or data to the most suitable memory resource. Our study is a first step toward the heterogeneous memory system software optimization. We studied a heterogeneous memory architecture with both hardware- and software-controlled SRAM cache coexisted. Traditional solely access-based code selection decision used in a SPM-only system cannot be directly applied to systems with both cache and SPM. In such a system, both code access frequency and cache miss are critical to performance and energy consumption. In such coexisted system, expensive external memory accesses can be reduced by either mapping the code fragments within a working set to the SPM or by properly layout the code fragments to different cache sets. With the help of the SPM, we can achieve a better energy and performance behavior with a simpler cache organization. In this thesis, we investigated the computation requirements of a multimedia codec (H.264) to fulfill its real-time requirement. Assisted with the knowledge of H.264 video structure, we statically analyzed the worst case execution time of the H.264 decoder under different usage scenarios. The resource scheduler can flexibly apply suitable allocation strategy obtained offline according to monitored video content during runtime. This way we can achieve a better resource utilization while still providing our users a pleasant viewing experience. For real and complicated applications such as multimedia which often exhibit multiple execution hotspots, we have effectively extended our static software memory optimization strategy to dynamically respond to the application need. In this study, we propose a new SPM code selection policy based on the severity of causing cache conflict that covers both importance factors. Moreover, codes in large programs often exhibit time-varying nature of importance during different program execution phases. In contrast to previous application-based dynamic SPM optimizations, we devise a phase detection method enabling us to utilize a finer-grained phase-based dynamic management to determine suitable codes in SPM. Results indicate that we are able to reduce the instruction memory energy-delay product by 18.6 %, and 65.1% to that of a fully-cache, and a state-of-the-art approach of a cache and SPM co-existed system, respectively. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT079655811 http://hdl.handle.net/11536/125831 |
显示于类别: | Thesis |