標題: | 可重定目標的混和型二元碼轉譯系統 A Retargetable Hybrid Binary Translation System |
作者: | 游竣彥 楊武 資訊科學與工程研究所 |
關鍵字: | 二元碼轉譯;LLVM;ARM;X86;Linux;Binary Translation;LLVM;ARM;X86;Linux |
公開日期: | 2011 |
摘要: | 靜態二元碼轉譯系統和動態二元碼轉譯系統有各自的優點和缺點。靜態二元碼轉譯系統可以針對整個程式進行最佳化並且執行時不會有額外的負擔,但是靜態轉譯很難預測程式執行時的行為,例如:間接跳轉。如果程式含有手寫的組合語言,靜態二元碼轉譯系統可能無法找到所有的跳轉位址。一種解決辦法是使用位址對應表對應程式內所有的位址,但該解決方案將增加二元碼轉譯之後的大小並且會減慢程式的執行速度。另一方面,動態二元碼轉譯系統可以根據執行的環境對程式進行最佳化並能轉譯自我修改程式碼以及容易地處理間接跳轉。由於轉譯的時間包含在程式執行時間內,動態二元碼轉譯系統不能執行花時間的最佳化演算法,因此動態二元碼轉譯系統產生的代碼品質不如靜態二元碼轉譯系統。而我們在這邊提出了混合型二元碼轉譯系統,結合靜態二元碼轉譯系統和動態二元碼轉譯系統的優點。利用 LLVM 編譯器框架轉譯並最佳化來源平台的二元碼,最後生成目標平台二元碼。系統會先進行一次靜態轉譯,在執行時如果發生例外情況,轉譯系統會轉換成動態轉譯模式。這樣我們可以放心地轉譯包含手寫組合語言的二元碼,並且無須對應所有位址。在 ARM 平台二元碼轉譯到 x86 平台的實驗中,部分 SPEC CINT 2006 程式執行速度比 QEMU 快 2 至 5 倍。 Static binary translation (SBT) systems and dynamic binary translation (DBT) systems have their own merits and disadvantages. SBT can perform whole-program optimization and do not incur run-time overhead. However, it is hard to predict a program’s run-time behaviors, such as indirect branches. If binary contains handcrafted assembly, SBT system may not find all branch addresses. One solution is to use address mapping table to map all addresses in the code segment but this solution will increase the size of the binary code significantly and slow down program execution. On the other hand, DBT can perform optimizations based on the execution environment and can handle self-modifying code and indirect branches easily. Because the translation time accounts for a part of the execution time, DBT cannot perform aggressive optimizations. Therefore, quality of the code generated by DBT is not as good as that by SBT. We present a hybrid binary translation (HBT) system which combines the merits of both SBT and DBT. It leverages the LLVM infrastructure to translate source binary code, optimize, and generate target binary code. It first translates binary statically. If a run-time exception happens, the HBT system switches to dynamic translation. In this way we can safely translate binary, which contains handcrafted assembly, and keep good performance without mapping all addresses. In our ARM-to-x86 experiments, some SPEC CINT 2006 benchmarks can run 2 to 5 times faster than QEMU. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT079855518 http://hdl.handle.net/11536/48252 |
Appears in Collections: | Thesis |