Title: ARM/Thumb混合指令集之二元碼轉譯
On the Binary Translation of ARM/Thumb mixed ISA Binaries
Authors: 陳俊宇
Chen, Jiunn-Yeu
楊武
資訊科學與工程研究所
Keywords: 二元碼轉譯;Binary Translation
Issue Date: 2015
Abstract: ARM和Thumb指令集分別由32位元及16位元的指令組成。對於一個程式來說,它可能只擁有ARM指令,或者同時擁有ARM/Thumb指令。混合ARM/Thumb指令集擁有不同長度的指令,聽起來跟傳統複雜指令集(CISC)有點類似。但兩者實際上大相逕庭,主要有兩點不同。其一,在複雜指令集如X86中,不同長度的指令是錯綜擺放的,也就是前後的指令長度可能不同。混合ARM/Thumb指令集中,不同長度的指令集各自形成區塊,僅憑有限的跳躍指令可供不同指令集的指令間切換。其二,在X86指令集中每個指令有其獨特的編碼,但在混合ARM/Thumb指令集中,ARM和Thumb的編碼是重疊的。若給定一二元碼串列,只會得到一組對應X86指令。但若分別用ARM指令集和Thumb指令集轉譯,可能會得到兩組合法的指令。此時,無法判斷究竟該用何種方式轉譯是最大的問題。 這篇論文中,我們提出了雙元轉譯來確保轉譯之混合ARM/Thumb指令集的程式可以正確執行。除此之外,我們也利用訪碼分析來找出錯誤的轉譯。以此來剔除錯誤的轉譯,可降低轉出程式的大小並使其更有效率。 我們也提出了一個有效率的自動化驗證工具,該工具可檢驗機器狀態以及存儲至記憶體的資料。這個驗證工具可支援所有種類的二元碼轉譯系統:靜態,動態,及混合式二元碼轉譯系統。 我們同時也提出了一個基於控制流的分切法來將轉譯之程式分切為數個細小的元件,避免最佳化造成過長的轉譯時間。由於分切可能會造成程式效率降低,我們在分切的過程也盡可能避免可能造成的損害。
ARM architecture is a dominator of the embedded-system market. It supports two different ISAs: the ARM ISA which is composed of 32-bit instructions and the Thumb ISA, which is composed of 16-bit instructions. An ARM binary can be composed by only ARM instruction or intermixed ARM instructions and Thumb instructions. There are two fundamental differences between ARM/Thumb mixed binaries and CICS binaries. The first difference is the layout of the binary. In the x86 binaries, instructions of various lengths are intermixed together. On the other hand, the ARM instructions and Thumb instructions are grouped into several regions in the ARM/Thumb mixed ISA binaries. The second difference is the encoding space. All the instructions of x86 ISA belongs to one encoding space. A byte stream can decoded into exactly one instruction sequence. In contrast, the ARM ISA and the Thumb ISA share the same encoding space, which means the same byte stream can be decoded as ARM instructions or Thumb instructions. The ARM processor will not get lost because there is a program status register that helps to record the current processor state. The processor state determines how to decode the byte stream. Whenever the control flow switches to a region with the different ISA, the processor state also changes. When a static binary translator translates an ARM/Thumb mixed ISA binary, it does not know the processor state for each given instruction. This work aims to correctly translate the ARM/Thumb mixed ISA binaries and optimize the translated binaries if possible. To validate the translation of ARM/Thumb mixed ISA binaries, we also propose an automatic tool to validate the translated binary. The complexity of the binary translator, manual checking is time consuming and error prone. An automatic validation tool is needed tio detect the the mistranslated instruction. The validation tool can aid static, dynamic, and hybrid binary translators. We also propose methods to speed up the validator. Moreover, a carefully designed validation tool can be easily migrated to support different ISAs. Several tips are provided for modifying the validator to support other ISAs. One of the advantages of the static binary translation is the aggressive optimizations can be applied. However, the whole program optimization may take excessive optimization time when the source program becomes larger. Partitioning the program into smaller units can help to reduce the optimization time. Nevertheless, improperly breaking up a code block may result in seriously slowdown. To keep the performance of the translated program with the reduced optimization time, we proposed a control flow based partition method to well partition the source program.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079755819
http://hdl.handle.net/11536/126926
Appears in Collections:Thesis