管線化及叢集化之超長指令集數位信號處理器之高效能資料路徑設計

Full metadata record

DC Field	Value	Language
dc.contributor.author	蕭丕承	en_US
dc.contributor.author	Pi-Chen Hsiao	en_US
dc.contributor.author	劉志尉	en_US
dc.contributor.author	Chih-Wei Liu	en_US
dc.date.accessioned	2014-12-12T02:26:08Z	-
dc.date.available	2014-12-12T02:26:08Z	-
dc.date.issued	2005	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#GT009211680	en_US
dc.identifier.uri	http://hdl.handle.net/11536/67568	-
dc.description.abstract	大部分的數位訊號處理應用程式都具有高度資料階層以及指令階層平行度的特性，因此可以藉由叢集化以及加深管線的方式來增加資料路徑的效率。然而，複雜的前饋(forwarding)網路以及叢集間連結(inter-cluster communication)網路抵銷了叢集化及加深管線所提升的效能。這篇論文以標準單元設計為基礎利用正反器和多工器來分析前饋單元與叢集間連結機制的複雜度。藉由這些分析，我們提出複雜度感知的前饋單元架構和以記憶體讀取/寫入(load/store)指令為基礎的叢集間連結機制。除此之外，我們還提出了分散式乒乓暫存器組架構來進一步降低叢集內暫存器組的複雜度。在實作的部分，我們使用UMC 0.13um 1P8M CMOS製程來實現我們設計。實驗的結果顯示，我們提出了前饋單元架構可以增加13.2%的運作時脈，而分散式乒乓暫存器組搭配我們提出的叢集間連結機制則可以減少76.8%的面積和46.9%的暫存器存取時間。對於可攜帶型裝置的應用方面，我們另外提出了與原本應用程式完全相容的折疊式資料路徑架構。比起原本的設計，這種架構可以節省55.33%的面積和增加26.3%的運作速度。最後，我們利用前述的前饋單元和叢集架構設計並實現了一個完整的4-way 超長指令集(VLIW)數位訊號處理器。實作與模擬的結果顯示在UMC 0.13um 1P8M CMOS的製程下，其最高工作頻率為333MHz，且具有近似於現在市面上數位訊號處理器的運算能力。	zh_TW
dc.description.abstract	Most DSP applications feature a high degree of data-level and instruction-level parallelism, which enables efficient datapath design with clustering and deep pipelining. However, the ad-hoc data forwarding and inter-cluster communications in most processors significantly compensate the advantages. This thesis presents analytical formulae which are based on cell-based implementation with flip-flops and multiplexers to analyze the complexity of forwarding unit and inter-cluster communication mechanisms. We also propose a complexity-aware data forwarding architecture and a simple inter-cluster communication mechanism based on load/store instruction pairs. Moreover, we introduce the distributed & ping-pong register file to further reduce the complexity of register file inside clusters. In the experiments with UMC 0.13um 1P8M CMOS technology, our proposed forwarding architecture can improve cycle time by 13.2%, while the distributed ping-pong register file collocated with proposed inter-cluster communication mechanism can reduce the area and access time of register file by 76.8% and 46.9%. For portable applications, we bring up the folded datapath with binary compatibility which saves 55.33% area and increases the clock speed by 26.3%. Finally, we implement the proposed forwarding unit and the proposed inter-cluster communication mechanism with distributed & ping-pong register file organization in a complete 4-way VLIW DSP processor which can operate at 333MHz and shows comparable performance with state-of-the-art DSPs.	en_US
dc.language.iso	en_US	en_US
dc.subject	前饋	zh_TW
dc.subject	叢集化	zh_TW
dc.subject	暫存器組	zh_TW
dc.subject	超長指令集	zh_TW
dc.subject	數位信號處理器	zh_TW
dc.subject	管線化	zh_TW
dc.subject	forwarding	en_US
dc.subject	clustering	en_US
dc.subject	register file	en_US
dc.subject	very long instruction word (VLIW)	en_US
dc.subject	digital signal processor (DSP)	en_US
dc.subject	pipelining	en_US
dc.title	管線化及叢集化之超長指令集數位信號處理器之高效能資料路徑設計	zh_TW
dc.title	Efficient Datapath Design for Clustered & Pipelined VLIW DSP Processors	en_US
dc.type	Thesis	en_US
dc.contributor.department	電子研究所	zh_TW
Appears in Collections:	Thesis