标题: 在HBT-86上支援AVX指令集架构
Supporting Advanced Vector Extensions (AVX) on HBT-86
作者: 刘尚雯
单智君
Liu, Shang-Wen
Shann, Jyh-Jiun
资讯科学与工程研究所
关键字: 二元码转译器;混合型二元码转译器;AVX指令集;LLVM;单指令流多资料流;Binary Translation;Hybrid Binary Translation;AVX ISA;LLVM;SIMD
公开日期: 2016
摘要: HBT-86是一个以LLVM编译器为基础所开发的可重定目标混合型二元码转译器系统。来源执行档可支援的指令集包含x86-32与x86-64整数指令、x87浮点数指令、以及SIMD (单指令流多资料流)类型的部分SSE (Streaming SIMD Extensions)系列指令。目前可以产生在x86-32、x86-64以及ARM环境执行的目标执行档。
继SSE之后,Intel提出了SSE的延伸架构AVX (Advanced Vector Extensions)指令集。然而目前HBT-86系统尚未支援此类指令,无法成功转换这些执行档。因此,本论文之主要研究目标为在HBT-86上,设计与实作对 AVX指令集的支援。本研究之作法为将LLVM MC指令转译为LLVM中间表示法(LLVM intermediate representation),并且在LLVM IR层模拟AVX指令的行为与暂存器环境的设定。此外,本论文亦扩充HBT-86对SSE指令的模拟与提升SSE和AVX的相容度,以及扩充HBT-86内的系统调用方法(System Call)。
在实验中,我们测试多种以执行AVX指令为主的标竿程式。比较对象为Bochs,一个以C++撰写并对每一道二元码指令做软体模拟的模拟器,可以支援x86-32和x86-64平台的来源与目的执行档。在来源执行档为x86-32到目标平台为x86-64的情况下,整数类型的AVX标竿程式执行效能是Bochs的11.02倍,浮点数类型的AVX标竿程式执行效能是Bochs的14.35倍。而相对于Native执行档的执行时间, x86-32转x86-64下,整数类型的执行时间为4.16倍,浮点数类型的执行时间为3.34倍。来源执行档为x86-32到目标平台为ARM的拟真之实验结果,显示我们的系统可成功地将AVX标竿程式转成ARM的本地码,并在ARM平台上执行。
The HBT-86 is an LLVM-based retargetable hybrid binary translation system. The source binary Instruction Set Architectures (ISA) supported by HBT-86 including x86-32and x86-64 integer instructions, x87 floating-point instructions, and Streaming SIMD Extensions (SSE). Furthermore, HBT-86 can generate target binary that can be executed on x86-32, x86-64, and ARM target platforms.
In recently years, Intel proposed Advanced Vector Extensions (AVX) which is a 256-bit instruction set extension to SSE. However, HBT-86 has not supported AVX ISA yet, and thus it cannot successfully emulate the binary executable which contains AVX instructions. Therefore, our research aims to design and implement the emulation of AVX instructions on HBT-86. In this thesis, we translate the LLVM Machine Code into LLVM intermediate representation (IR) and emulate the behavior of AVX instructions and registers in the translated code. Besides, we also improve the supportiveness and compatibility of SSE in HBT-86. Moreover, we increase the supportiveness of system call.
We compare our system with the Bochs which is a full emulator written in C++ and uses software emulation to emulate every instruction. It supports x86-32 and x86-64 source/target executable. In our AVX x86-32 to x86-64 emulation, our HBT-86 is 11.02 and 14.35 times faster than Bochs for integer and floating-point benchmark, respectively. While comparing with the native binary code, our HBT-86 is 4.16 and 3.34 times slower for integer and floating-point benchmarks, respectively. Finally, our HBT-86 may translate our AVX benchmarks into ARM binary code and execute these code on an ARM platform successfully.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070356085
http://hdl.handle.net/11536/140061
显示于类别:Thesis