標題: 適用於多媒體應用之雙核心處理器架構設計及軟體開發
Architecture Design of Dual-Core Multimedia Processor and its SW Porting
作者: 歐士豪
Shih-Hao Ou
劉志尉
Chih-Wei Liu
電子研究所
關鍵字: 數位訊號處理器;雙核心處理器;Digital Signal Processor;Dual-core processor
公開日期: 2004
摘要: 傳統處理器已經無法滿足下一代通訊系統所需的運算能力。近年來逐漸興起的多核心(multi-core)架構儼然成了能以合理的成本及功率消耗來應付龐大運算量的解決方案之一。其中,更因為多數嵌入式系統中工作的相異性(divergence),雙核心(dual-core)處理器更引起廣泛的討論。在雙核心處理器架構下,利用一顆RISC處理器應付以控制為主的工作以及利用一顆數位訊號處理器來達成高效能的運算。兩顆不同性質的處理器都可以分別針對不同的工作去作最佳化,以達最高效能。然而,直接整合兩個獨立處理器核心的雙核心處理器會造成多餘的設計,而顯得有點浪費。在本論文中,我們針對雙核心處理器多餘設計的問題提出一個解決方案。我們重新設計一顆適用於雙核心處理器的簡化版數位訊號處理器,主要著重於資料運算。我們提出了一個以資料為主(data-centric)的指令集架構,來幫助此數位訊號處理器達到平順的資料流,進而接近一般ASIC的效能。而此數位訊號處理器的微架構(micro-architecture)更是有著不受時間延遲影響(latency-insensitive)的特性,可輕易抽換,並隨意搭配具有不同延遲時間之功能模組,只需以軟體工具做簡單的設定,也就是更改靜態排程時每個功能單元的時間延遲即可,完全不需要多餘的硬體代價,也不會影響其他的硬體區塊。我們同時也發展了一套自動化的工具組,此工具組可將高階的描述語言自動編譯出可執行的機器碼。我們以此精簡化的數位訊號處理器搭配另一個ARM RISC核心設計一個雙核心處理器。我們並且實現了一個雙核心處理器雛型(prototyping)在ARM Versatile發展平台,並且成功利用雙核心共同工作(co-work)完成一個MP3播放器於此平台上。在round-off error的分析實驗中,提出的數位訊號處理器在24位元的模擬中,可以達到64dB的PSNR,並且在一些常見的數位訊號處理核心(kernel),相較於具有與所提出的數位訊號處理器相似硬體資源的Analog Device ADSP-218x可以達到幾乎近三倍的效能。此數位訊號處理器已經在聯電1P6M CMOS的製程下成功下線,並且可操作在314.5MHz,平均功率消耗為52mW。此晶片的核心面積(core size)為1.5x1.5mm2。
Traditional single-core processor can no longer satisfy the increasing computational requirement of the next-generation media-rich communication systems. Recently, the multi-core architecture has emerged as an effective solution to provide high performance at reasonable cost. Among these multi-core architectures, the dual-core processor, consisting of a RISC core and a DSP core, plays an interesting role due to the task divergence of the most embedded systems which require the control-oriented tasks as well as the computation-intensive tasks. The RISC core is often suitable and optimized for those control-oriented tasks while the DSP is always in its way toward high-efficiency computations. However, there exists some extra cost problem due to the ad-hoc combination of two off-the-shell processor cores in the dual-core scenario. In this thesis, we design a shrunk DSP core from the scratch to remove the overlapped functionality existed both in the RISC core and the DSP core. Furthermore, the data-centric ISA helps the DSP core achieving smooth data flow and the latency-insensitive micro-architecture allows easy replacement and arbitrary collocation of functional modules with different latency. Only simple modification in software, i.e. modify the latency when performing operation scheduling, is required to adapt to different hardware configuration without altering the other hardware blocks. Additional, we have developed a complete tool chain to generate the binary code from the high-level description automatically. We further design a new dual-core processor integrating one ARM926EJ-S RISC core and the proposed compact DSP core. We also carry out a prototyping system in the ARM Versatile platform baseboard, and port a MP3 player onto this dual-core platform successfully. The proposed DSP core achieves 64dB PSNR in the 24-bit configuration over the single-precision floating-point in the 2D DCT round-off error analysis and outperforms the ADSP-218x which has similar hardware resources to that of the proposed DSP core by almost 3 times estimated in execution cycles in several popular DSP kernels. The proposed DSP core has been fabricated in the UMC 1P6M technology and it can operate at 314.5MHz while consuming only 52mW average power and its core size is 1.5x1.5mm2.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009211684
http://hdl.handle.net/11536/67612
Appears in Collections:Thesis