标题: 异质多核心全系统模拟与基于协作绪阵列的资料预取机制
A Full System Simulation Framework for HSA and CTA-based Prefetch Mechanism
作者: 郑亦呈
Cheng, I-Cheng
陈添福
资讯科学与工程研究所
关键字: 异质多核心;模拟;协作绪阵列;资料预取;Simulation;hetergeneous;HSA;prefetch
公开日期: 2015
摘要: 随着架构设计的发展,多核心处理器的设计已经从传统的“同质多核心处理器(Homogeneous multi-core processor)”进入“异质多核心处理器(Heterogeneous multi-core)”的时期,AMD提出的HSA(Heterogeneous systems architecture)整合了CPU与GPU整合在同一个晶片里,而且他们享有共同的定址空间hUMA(heterogeneous Uniform Memory Access),透过这种共享的定址空间CPU可以直接存取GPU上的资料,GPU在运算时也不用事先复制一份CPU空间里原有的资料。然而,在CPU与GPU共享记忆体的架构下,将会有许多资源共享,包含快取记忆体、汇流排..等。而且还有会Cache Coherence的问题。但是由于HAS的架构才刚提出不久,所以在模拟平台上仍没有一个完整的模拟环境可以去探讨上述架构的问题,因此本论文提出了一个完整的模拟架构,结合了一个CPU的模拟器以及GPU的模拟器,并实现定址空间共享。最后我们在GPU端实现一个基于协作绪阵列的资料预取机制,并比较传统CPU资料预取及GPU资料预取的差异,以提升GPU端的效能。
Computer architecture is transitioning from the homogeneous multicore era into the heterogeneous multicore era. AMD proposes Heterogeneous systems architecture (HSA) which integrates CPUs and GPUs physically on a chip and provides shared virtual address spaces between them. With shared virtual memory, the time of moving data between devices' disjoint memories can be saved. Therefore, there are new resource management issues, such as shared last-level cache managements, MMU for CPU and GPU, main memory management, etc. In addition, the coherence problem between CPU and GPU will be a new issue as well. However, there is no such a complete simulator to provide a platform for us to develop the issue mentioned above.
In this thesis, we propose a full system simulation framework for HSA which combines CPU model, QEMU, and GPU model, GPGPU-Sim. For HSA, we support parts of OpenCL 2.0 runtime and global memory segments with shared address space between CPU and GPU. And, we compared the traditional CPU prefetching mechanism with GPU prefetching mechanism and implement a CTA-based prefetching mechanism to improve GPU’s performance.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070256062
http://hdl.handle.net/11536/127191
显示于类别:Thesis