Translating the ARM Neon and VFP instructions in a binary translator

doi:10.1002/spe.2394

Full metadata record

DC Field	Value	Language
dc.contributor.author	Guo, Yu-Chuan	en_US
dc.contributor.author	Yang, Wuu	en_US
dc.contributor.author	Chen, Jiunn-Yeu	en_US
dc.contributor.author	Lee, Jenq-Kuen	en_US
dc.date.accessioned	2017-04-21T06:56:08Z	-
dc.date.available	2017-04-21T06:56:08Z	-
dc.date.issued	2016-12	en_US
dc.identifier.issn	0038-0644	en_US
dc.identifier.uri	http://dx.doi.org/10.1002/spe.2394	en_US
dc.identifier.uri	http://hdl.handle.net/11536/132790	-
dc.description.abstract	Binary translation attempts to emulate one instruction set with another on the same or different platforms. The important technique is widely used in modern software. Vector and floating-point instructions are widely used in many applications, including multimedia, graphics, and gaming. Although these instructions are usually simulated with software in a binary translator, it is important to support them such that the host single-instruction, multiple-data (SIMD) and floating-point hardware are efficiently used during emulation. We report our design and implementation of the emulation of ARM Neon and vector floating point (VFP) instructions in the machine-code-to-low-level-virtual-machine (MC2LLVM) binary translator. The Neon and VFP instructions are first translated into carefully chosen sequences of LLVM intermediate representation (IR), and later, the IR sequences are optimized and translated into the host native binary by the existing LLVM backend. Because MC2LLVM makes use of the vector and floating-point types in LLVM IR, the generated host native binary can take full advantage of the vector and floating-point functional units, if present, of the host machine. To be fully compliant with Neon and VFP instruction sets, all the features are supported, including the flush-to-zero mode, default not a number mode, and floating-point exceptions. The experimental results show that code generated by MC2LLVM with the Neon and VFP extensions achieves an average speedup of 1.174x in SPEC 2006 benchmark suites and exhibits a floating-point throughput of 12.05x in LINPACK, compared with code generated by MC2LLVM without the Neon and VFP extensions. Furthermore, MC2LLVM is 3.36x faster than QEMU for processing Neon/VFP instructions. Copyright (c) 2016 John Wiley & Sons, Ltd.	en_US
dc.language.iso	en_US	en_US
dc.subject	binary translation	en_US
dc.subject	cloud computing	en_US
dc.subject	LLVM	en_US
dc.subject	floating-point instruction	en_US
dc.subject	Neon	en_US
dc.subject	vector instruction	en_US
dc.subject	VFP	en_US
dc.subject	virtualization	en_US
dc.title	Translating the ARM Neon and VFP instructions in a binary translator	en_US
dc.identifier.doi	10.1002/spe.2394	en_US
dc.identifier.journal	SOFTWARE-PRACTICE & EXPERIENCE	en_US
dc.citation.volume	46	en_US
dc.citation.issue	12	en_US
dc.citation.spage	1591	en_US
dc.citation.epage	1615	en_US
dc.contributor.department	交大名義發表	zh_TW
dc.contributor.department	National Chiao Tung University	en_US
dc.identifier.wosnumber	WOS:000387367600001	en_US
Appears in Collections:	Articles