低功率低電壓之資料處理單元設計

標題:	低功率低電壓之資料處理單元設計 Low Power Low Voltage Datapath Design
作者:	王儷蓉 Wang, Li-Rong 周世傑 Jou, Shyh-Jye 電子工程學系電子研究所
關鍵字:	低功率;低電壓;資料處理單元;雙相邊緣觸發正反器;準位轉換;可重組式;混合臨界電壓;乘法累加;low power;low voltage;datapath;double-edge-triggered flip-flop;level-converting;recofigurable;mixed-Vt;MAC
公開日期:	2012
摘要:	在通訊系統、數位信號處理及平行運算所需之高效能處理器或是內嵌式系統中，數位算術扮演極重要的角色。其中乘法及乘法累加為最常見之數學運算，且在數位算術模組中具有關鍵地位。為了提昇運算速度及降低功率，本論文提出一個整齊架構Modified Booth encoding （MBE或稱為Radix-4）乘法器並可應用於可重組式(reconfigurable) 乘法累加器核心設計。此乘法器透過我們所提出之改良式Modified Booth編碼器及選擇器來移除多餘乘積列並且採用一個混合式(hybrid)的二進制補數電路來降低面積並提升速度。此乘法器隨之應用於建立一32位元大小之可重組式乘法累加器之上，其可同時執行一組32×32、兩組16×16或4組8×8的有號乘法累加運算。根據使用130nm CMOS製程的實驗結果，若採用單一臨界電壓(single threshold voltage, SVT)標準元件庫來實現，和傳統的Modified-Booth乘法器相比，我們設計的乘法器可以分別省下15.8%的面積及11.7%的功率消耗。若採用混合臨界電壓(mixed threshold voltage, MVT) 標準元件庫來實現我們所設計的可重組式乘法累加器，則可以較先前已發表過的可重組式乘法累加器設計省下4.2%的面積以及降低7.4%的功率消耗。而我們設計的乘法累加器核心可以完全地加以管線(pipeline)化設計使其能夠在更高的速度下運轉。上述之乘法累加器核心加上內建自我測試（built-in-self-test, BIST）電路，以我們自行開發之學術性130nm CMOS製程混合臨界電壓標準元件庫來設計並實現一顆可重組式乘法累加器晶片。將電路佈局後（post-layout）所萃取之延遲資料，反饋至動態模擬程式中，當此晶片工作在500MHz及1.2V的電壓時，經由模擬獲得其功率消耗為86.25mW。此晶片已經製造並且測試過。由於測試機台功能上的限制，扣除晶片中輸出入腳位及內建測試電路的功率消耗，此可重組式乘法累加器在125MHz的工作頻率下，量到12.5mW的功率消耗。近來許多系統晶片電路採用更多的管線階級層以提高處理能力（throughput），故此在晶片中增加了大量的暫存器。若是降低這些暫存器的功力消耗即可減少整個晶片的功率消耗。本論文提出一新型以感測放大器（sense-amplifier，SA）為基礎之內建準位轉換雙相邊緣觸發正反器（SA-based double-edge-triggered implicitly level-converting flip-flop，簡稱為SA_DET-LCFF）設計。由於感測放大器的正迴授特性，可以減輕使用傳統差動式靜態雙相邊緣觸發正反器時（Differential Static double-edge-triggered flip-flop, DS_DET-FF）所導致極大交叉電流之問題，如此一來便可大幅減少正反器的功率消耗並提升速度。此外，我們提出的雙相邊緣觸發正反器（double-edge-triggered flip flop，DETFF）在不降低系統效能的前提下，提供充裕之低電壓至高電壓準位轉換。根據使用130nm CMOS製程的實驗結果，若同樣以單一臨界電壓元件或電晶體來設計，在將0.84V的輸入電壓轉換成1.2V的輸出電壓的條件下，我們所提出的內建準位轉換雙相邊緣觸發正反器，較傳統的差動式靜態雙相邊緣觸發正反器在功率－延遲－乘積（power-delay-product, PDP）的計算上，降低了64%。若更進一步將我們的內建準位轉換雙相邊緣觸發正反器採用混合臨界電壓設計並實現的話，更可以減少78%的PDP值。由此可知，我們提出的感測放大器型內建準位轉換雙相邊緣觸發正反器，極其適用於低功率低電壓應用上所採取之動態電壓頻率調整（dynamic voltage frequency scaling，DVFS）平台解決方案。 Multiplication and multiplication-accumulation (MAC) are the very common mathematical operations and behaves as the key elements of a digital arithmetic module. In order to achieve the goal of performance improvement and power reduction, a well-structured modified Booth encoding (MBE) multiplier which is applied in the design of a reconfigurable MAC core is proposed. The multiplier adopts an improved Booth encoder and selector to achieve an extra-row-removal and uses a hybrid approach in the two’s complementation circuit to reduce the area and improve the speed. The multiplier is then used to form a 32-bit reconfigurable MAC which can be flexibly configured to execute one 32×32, two 16×16 or four 8×8 signed multiply-accumulation. Experimentally, when implemented with a 130nm CMOS single-Vt standard cell library, operated in 100MHz frequency, the proposed multiplier with two various two’s complementation circuit architecture achieves a 15.8% area saving and 11.7% power saving over the prior design respectively. And the proposed reconfigurable MAC achieves 4.2% area and a 7.4% power saving over the MAC designs published so far if implemented with a mixed-Vt (MVT) standard cell library at 100MHz frequency further. A chip of this proposed MAC core with built-in-self-test (BIST) circuit is designed, and implemented with the proposed 130nm academic-purpose MVT cell library. The post layout simulation of the chip shows that the chip can operate at 500MHz at a power consumption of 86.25 mW under the condition of 1.2 V VDD. Because of the limitation of testing equipment, the chip is measured at 125MHz and the power consumption of measured is 12.5mW, excluding the I/O pin and BIST circuit power consumption. A new double-edge-triggered implicitly level-converting flip-flop based on the sense-amplifier latch structure (SA_DET-LCFF) is proposed. The feedback property of the sense amplifier eases the problem of the relatively large crossover current in the conventional differential static double-edge-triggered flip flop (DS_DET-FF), which facilities the power reduction and performance improvement significantly. Moreover, this design provides sufficient level conversion from a lower to a higher supply voltage without degrading circuit performance. Experimentally, when implemented with a 130-nm process, single normal-Vt and 0.84 V VDD condition, it achieves 64% power-delay product (PDP) improvement as compared to that of the DS_DET-FF If implemented in the same technology but with a mixed-Vt, it achieves 78% improvement on PDP. Thus, the proposed SA_DET-LCFF is suitable for the design of clustered voltage scaling (CVS) or dynamic voltage frequency scaling (DVFS) design platform.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT079211831 http://hdl.handle.net/11536/72455
顯示於類別：	畢業論文