完整后设资料纪录
DC 栏位语言
dc.contributor.author萧丕承en_US
dc.contributor.authorPi-Chen Hsiaoen_US
dc.contributor.author刘志尉en_US
dc.contributor.authorChih-Wei Liuen_US
dc.date.accessioned2014-12-12T02:26:08Z-
dc.date.available2014-12-12T02:26:08Z-
dc.date.issued2005en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT009211680en_US
dc.identifier.urihttp://hdl.handle.net/11536/67568-
dc.description.abstract大部分的数位讯号处理应用程式都具有高度资料阶层以及指令阶层平行度的特性,因此可以藉由丛集化以及加深管线的方式来增加资料路径的效率。然而,复杂的前馈(forwarding)网路以及丛集间连结(inter-cluster communication)网路抵销了丛集化及加深管线所提升的效能。这篇论文以标准单元设计为基础利用正反器和多工器来分析前馈单元与丛集间连结机制的复杂度。藉由这些分析,我们提出复杂度感知的前馈单元架构和以记忆体读取/写入(load/store)指令为基础的丛集间连结机制。除此之外,我们还提出了分散式乒乓暂存器组架构来进一步降低丛集内暂存器组的复杂度。在实作的部分,我们使用UMC 0.13um 1P8M CMOS制程来实现我们设计。实验的结果显示,我们提出了前馈单元架构可以增加13.2%的运作时脉,而分散式乒乓暂存器组搭配我们提出的丛集间连结机制则可以减少76.8%的面积和46.9%的暂存器存取时间。对于可携带型装置的应用方面,我们另外提出了与原本应用程式完全相容的折叠式资料路径架构。比起原本的设计,这种架构可以节省55.33%的面积和增加26.3%的运作速度。最后,我们利用前述的前馈单元和丛集架构设计并实现了一个完整的4-way 超长指令集(VLIW)数位讯号处理器。实作与模拟的结果显示在UMC 0.13um 1P8M CMOS的制程下,其最高工作频率为333MHz,且具有近似于现在市面上数位讯号处理器的运算能力。zh_TW
dc.description.abstractMost DSP applications feature a high degree of data-level and instruction-level parallelism, which enables efficient datapath design with clustering and deep pipelining. However, the ad-hoc data forwarding and inter-cluster communications in most processors significantly compensate the advantages. This thesis presents analytical formulae which are based on cell-based implementation with flip-flops and multiplexers to analyze the complexity of forwarding unit and inter-cluster communication mechanisms. We also propose a complexity-aware data forwarding architecture and a simple inter-cluster communication mechanism based on load/store instruction pairs. Moreover, we introduce the distributed & ping-pong register file to further reduce the complexity of register file inside clusters. In the experiments with UMC 0.13um 1P8M CMOS technology, our proposed forwarding architecture can improve cycle time by 13.2%, while the distributed ping-pong register file collocated with proposed inter-cluster communication mechanism can reduce the area and access time of register file by 76.8% and 46.9%. For portable applications, we bring up the folded datapath with binary compatibility which saves 55.33% area and increases the clock speed by 26.3%. Finally, we implement the proposed forwarding unit and the proposed inter-cluster communication mechanism with distributed & ping-pong register file organization in a complete 4-way VLIW DSP processor which can operate at 333MHz and shows comparable performance with state-of-the-art DSPs.en_US
dc.language.isoen_USen_US
dc.subject前馈zh_TW
dc.subject丛集化zh_TW
dc.subject暂存器组zh_TW
dc.subject超长指令集zh_TW
dc.subject数位信号处理器zh_TW
dc.subject管线化zh_TW
dc.subjectforwardingen_US
dc.subjectclusteringen_US
dc.subjectregister fileen_US
dc.subjectvery long instruction word (VLIW)en_US
dc.subjectdigital signal processor (DSP)en_US
dc.subjectpipeliningen_US
dc.title管线化及丛集化之超长指令集数位信号处理器之高效能资料路径设计zh_TW
dc.titleEfficient Datapath Design for Clustered & Pipelined VLIW DSP Processorsen_US
dc.typeThesisen_US
dc.contributor.department电子研究所zh_TW
显示于类别:Thesis