標題: | 適於雙核心多媒體系統晶片之多重執行緒協同處理器介面 Multithreaded Coprocessor Interface for Dual-Core Multimedia SoC |
作者: | 卓志宏 Chih-Hung Cho 劉志尉 Chih-Wei Liu 電機學院IC設計產業專班 |
關鍵字: | 多重執行緒;處理器介面;Multithreading;Processor interface |
公開日期: | 2006 |
摘要: | 在高運算需求的行動多媒體應用的嵌入式系統裡,由於任務(Task)的多樣性,適合以異質性(Heterogeneous)整合的雙核心或多核心(Dual-Core/Multi-Core)運算平台(Platform)(即整合精簡指令集處理器(RISC)以及數位訊號處理器(DSP)平台)來實現。如德州儀器公司(TI)的OMAP為例,它是一市面上普遍使用的雙核心運算處理平台,其中DSP為伺服處理器(Slave),執行主處理器(Master,即RISC)所分派的運算任務。然而,DSP的利用率(Utilization)通常都不高。以一典型的多媒體應用為例(如JPEG Encoding),DSP的利用率大約在50~60%。造成DSP利用率不高的原因,可歸納為:(1)指令的延遲時間所引起的管線延遲(Pipeline Stall);(2)有限的指令層級平行化(Instruction-level Parallelism, ILP);(3)處理器之間的溝通(Inter-processor Communication, IPC)以及任務/執行緒管理(Task/Thread Management)所產生的代價。
要突破單一執行緒指令層級平行度(ILP)不高的問題,現今高效能DSP大多為多重執行緒(Multithreading)或多核心(Multi-Core)處理器。以提升執行緒層級平行化(Thread-level Parallelism)而言,其最主要的問題在於如何有效率做到處理器同步(Synchronization)、資料搬移(Data Movement)、處理器之間的溝通(IPC)與執行緒管理(Thread Management)等,這些問題使多重執行緒或多核心處理器的系統效能降低,導致所表現出的效能與系統所能提供的最高效能(Peak Performance)之間有很大的落差。本論文提出一適用於雙核心或多核心多媒體系統晶片的多重執行緒協同處理器介面(Host Processor Interface, HPI),來解決上述問題。為了達到Zero Instruction Latency,我們設計一可同時執行8個執行緒的IMT DSP Core,其執行緒的切換方式為在每個時間週期,8個執行緒以交錯的、依序的(Run-Robin)方式做切換,如此可避免長與短的管線延遲;此外,我們提出的HPI具有優先權(Priority)的執行緒分派功能,可使一個具有高優先權的執行緒不會因執行緒的切換而慢下來。我們以JPEG Encoding為例,由模擬結果可知,利用HPI,可有效的提升8執行緒的IMT DSP的利用率,其利用率可以從原來的55%增進到93%,而HPI所增加的面積只有DSP Core的6.25%。 Due to task divergence in most embedded systems, heterogeneous dual-core/multi-core SoC, i.e. RISC + DSP, is accepted as a cost-effective solution for the increasing computation demands in mobile media applications. TI OMAP, for example, is one popular dual-core platform, where the DSP, as the slave, performs the computation intensive task sent and requested by the host processor (i.e. RISC). However, the low DSP utilization problem may arise. For typical applications, the DSP utilization will be about 50~60%. We can generally attribute the low utilization problem to three causes: (1) pipeline stalls for instruction latency; (2) limits on instruction-level parallelism (ILP); and (3) communication overhead, including the inter-processor communication (IPC) and process/thread/task management. For delivering high performance beyond single thread ILP, modern DSPs are multi-core or multithreaded processors. On thread-level parallelism, the problems of inefficient synchronization, data movement, and thread management between RISC and DSP definitely degrade the system performance. Consequently, the peak and the delivered performance gap increases. In this thesis, a priority-based, multithreaded coprocessor interface for dual-core/multi-core multimedia SoC is proposed to address the aforementioned problem. In order to hide both short and long stalls, the DSP core is designed by 8-thread IMT core. Threads are interleaving executed in a cycle-by-cycle fashion. With the proposed priority-based host-processor interface (HPI) to facilitate communication with the host processor and the process/thread management, a high priority thread ready to execute without stall will not be slow down. The simulation results show that for JPEG encoding example, with HPI, the 8-thread IMT DSP utilization improves from 55% to 93% with only 6.25% chip area overhead. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT009395545 http://hdl.handle.net/11536/80378 |
顯示於類別: | 畢業論文 |