標題: 適用於多媒體應用之多執行緒數位訊號協同處理器
Multithreaded DSP Coprocessor for Multimedia Applications
作者: 鄧翔升
Hsiang-Sheng Teng
劉志尉
Chih-Wei Liu
電子研究所
關鍵字: 多執行緒;協同處理器界面;Multithreading;Coprocessor Interface
公開日期: 2007
摘要: 針對多樣且日益複雜的嵌入式系統應用,近年來興起的多核心架構平台(Multi-Core Platform),已被視為能以合理的成本滿足高運算需求的方案之一。其中以一微處理器單元(MPU)搭配一數位訊號處理器(DSP)的異質性雙核心處理器平台(Heterogeneous Dual-Core Platform),是目前行動多媒體嵌入式系統的主流。在異質雙核心處理器平台中,微處理器單元將具有高運算需求的工作(Task),交由數位訊號處理器作加速,藉此可達到有效率的分工並降低系統的耗能; 然而,繁複且不具效率的工作管理(Task Management),包括工作排程(Task Schedule)、工作指派(Task Dispatch)等,會因為處理器間的互動與複雜的處理器間的溝通(Inter-Processor Communication, IPC),造成異質性雙核心處理器平台的效能大為降低。為提升數位訊號處理器的效能,本論文提出一適用於多媒體應用之多執行緒數位訊號協同處理器,該協同處理器包括一負責與微處理器單元溝通、協調之精巧協同處理器介面(Smart Coprocessor Interface, SCI),以及具零管線延遲(Zero Pipeline Latency)特性之交錯式多執行緒(Interleaved Multithreaded)數位訊號處理器。所提之精巧協同處理器界面,可動態管理工作分派,並可有效的解決處理器間的溝通問題。為評估精巧協同處理器界面對數位訊號處理器效能的影響,本論文利用CoWare電子系統層級(ESL)設計方法,建構一雙核心處理器虛擬平台(Virtual Platform)。以一張256x256 JPEG壓縮為例,所提之精巧協同處理器界面,分別較傳統以微處理器單元透過作業系統(如Linux)、或在數位訊號處理器上執行微核心(Micro-Kernel或 RTOS)進行兩處理器間的工作管理,減少約68% 以及15 %的執行時間。另一方面,所提之數位訊號處理器的資料路徑為一加法器(A)、一乘法器(M)、一累加器(A)、與一移位器(S)的串接(Cascade)組合。此AMAS複雜資料路徑(Composite Datapath)可執行加乘(AM), 乘加(MA), 加-乘-累加-位移(AMAS)等多媒體應用常見的運算。最後,我們利用TSMC 0.13μm CMOS 製程,實做所提出之數位訊號協同處理器,其操作頻率為250MHz,平均消耗40mW功率。該晶片面積為2.7 x 2.7 mm2,而精巧協同處理器界面僅佔約0.65%的晶片面積。
Due to task divergence in most embedded systems, heterogeneous dual-core/multi-core SoC, i.e. RISC+DSP, is accepted as a cost-effective solution for the increasing computation demands in mobile media applications. TI Davinci, for example, is one popular dual-core platform, where the DSP, as the slave or coprocessor, performs the computation-intensive task sent and requested by the host processor (i.e. RISC). However, inefficient DSP-task management, such as task scheduling and task dispatch, induces inter-processor communication (IPC) overheads and thus lowers the DSP performance. In order to improve DSP performance, a multithreaded DSP coprocessor for multimedia applications is proposed in this thesis. The DSP coprocessor consists of a smart coprocessor interface (SCI) and an application-specific interleaved multithreaded datapath. The proposed SCI can dynamically manage DSP-tasks and effectively reduce IPC overheads, while the interleaved multithreaded datapath can exploit thread-level parallelism (TLP) to tolerate full pipeline latency. A dual-core SystemC transaction-level virtual platform, constructed by CoWare electronic system level (ESL) design platform, is used to evaluate the DSP performance affected by the SCI. To encode a 256x256 JPEG image, the SCI reduces 68% and 15% total execution time comparing that with task management on MPU supported by OS and on DSP supported by a μ-kernel or RTOS, respectively. On the other hand, a cascaded adder, multiplier, accumulator, and shifter (A-M-A-S) functional units are applied on the proposed DSP datapath. This composite datapath is able to perform complicated operations, such as addition-multiplication (AM), multiplication-accumulation (MA), and addition-multiplication-accumulation-shift (AMAS) for multimedia applications. The proposed multithreaded DSP coprocessor is implemented by TSMC 0.13μm CMOS technology. The implementation results show that it can operate at 250MHz with 40mW power consumption. The chip area is 2.7x 2.7mm2 and the SCI occupies only 0.65% of the total chip size
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009411637
http://hdl.handle.net/11536/80549
顯示於類別:畢業論文