Full metadata record
DC FieldValueLanguage
dc.contributor.authorGuo, Yu-Chuanen_US
dc.contributor.authorYang, Wuuen_US
dc.contributor.authorChen, Jiunn-Yeuen_US
dc.contributor.authorLee, Jenq-Kuenen_US
dc.date.accessioned2017-04-21T06:56:08Z-
dc.date.available2017-04-21T06:56:08Z-
dc.date.issued2016-12en_US
dc.identifier.issn0038-0644en_US
dc.identifier.urihttp://dx.doi.org/10.1002/spe.2394en_US
dc.identifier.urihttp://hdl.handle.net/11536/132790-
dc.description.abstractBinary translation attempts to emulate one instruction set with another on the same or different platforms. The important technique is widely used in modern software. Vector and floating-point instructions are widely used in many applications, including multimedia, graphics, and gaming. Although these instructions are usually simulated with software in a binary translator, it is important to support them such that the host single-instruction, multiple-data (SIMD) and floating-point hardware are efficiently used during emulation. We report our design and implementation of the emulation of ARM Neon and vector floating point (VFP) instructions in the machine-code-to-low-level-virtual-machine (MC2LLVM) binary translator. The Neon and VFP instructions are first translated into carefully chosen sequences of LLVM intermediate representation (IR), and later, the IR sequences are optimized and translated into the host native binary by the existing LLVM backend. Because MC2LLVM makes use of the vector and floating-point types in LLVM IR, the generated host native binary can take full advantage of the vector and floating-point functional units, if present, of the host machine. To be fully compliant with Neon and VFP instruction sets, all the features are supported, including the flush-to-zero mode, default not a number mode, and floating-point exceptions. The experimental results show that code generated by MC2LLVM with the Neon and VFP extensions achieves an average speedup of 1.174x in SPEC 2006 benchmark suites and exhibits a floating-point throughput of 12.05x in LINPACK, compared with code generated by MC2LLVM without the Neon and VFP extensions. Furthermore, MC2LLVM is 3.36x faster than QEMU for processing Neon/VFP instructions. Copyright (c) 2016 John Wiley & Sons, Ltd.en_US
dc.language.isoen_USen_US
dc.subjectbinary translationen_US
dc.subjectcloud computingen_US
dc.subjectLLVMen_US
dc.subjectfloating-point instructionen_US
dc.subjectNeonen_US
dc.subjectvector instructionen_US
dc.subjectVFPen_US
dc.subjectvirtualizationen_US
dc.titleTranslating the ARM Neon and VFP instructions in a binary translatoren_US
dc.identifier.doi10.1002/spe.2394en_US
dc.identifier.journalSOFTWARE-PRACTICE & EXPERIENCEen_US
dc.citation.volume46en_US
dc.citation.issue12en_US
dc.citation.spage1591en_US
dc.citation.epage1615en_US
dc.contributor.department交大名義發表zh_TW
dc.contributor.departmentNational Chiao Tung Universityen_US
dc.identifier.wosnumberWOS:000387367600001en_US
Appears in Collections:Articles