適用於卷積神經網路應用之高精準度高效益靜態浮點數運算外積陣列處理器

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	李其駿	zh_TW
dc.contributor.author	劉志尉	zh_TW
dc.contributor.author	Li, Chi-Jiun	en_US
dc.contributor.author	Liu, Chih-Wei	en_US
dc.date.accessioned	2018-01-24T07:42:44Z	-
dc.date.available	2018-01-24T07:42:44Z	-
dc.date.issued	2017	en_US
dc.identifier.uri	http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070450242	en_US
dc.identifier.uri	http://hdl.handle.net/11536/142855	-
dc.description.abstract	深度卷積神經網路(DCNN)需要大量的計算，我們對此提出一低複雜度且高精確度運算引擎，利用靜態浮點(Static Floating-Point, SFP)算術運算，可讓運算都操作在有效或非零位元上，以提高能量效益，並且在有限的8位資料位寬的情況下達到更高的正確率。此外，我們導入Scalable Universal Matrix Multiplication Algorithm (SUMMA)的資料排程，能有效避免重複儲存資料，而且資料可以廣播的方式傳到所需要的運算引擎。嵌入之微量型資料流介面單元儲存器，可大幅降低運算元對於暫存器組織的存取頻率，以降低能量損耗。模擬結果顯示，與MIT Eyeriss加速器相比，在Alexnet 5層卷積層的運算中，在提供相當的吞吐率(Throughput)的情況下，我們所設計的SFP SUMMA深度推論加速器可節省約40%的功率損耗(167mW vs. 278mW)；利用ImageNet資料庫，我們所設計的SFP SUMMA深度推論加速器可提供約56.47%之Top-1準確率(註: GPU的Top-1準確率約為56.90%)，而MIT Eyeriss加速器僅可提供約50.18%準確率。利用TSMC 90 nm CMOS製程技術下，所提出之SFP SUMMA DIP可提供0.45 TOPs/W的效能。反觀，在執行相同Alexnet 5層卷積層的運算中，MIT Eyeriss加速器僅提供約0.3 TOPs/W(@65 nm CMOS)。	zh_TW
dc.description.abstract	We propose a high-accuracy and cost-effective array processor for Deep Convolution Neural Network (DCNN) inference application. The proposed Static Floating-Point (SFP) arithmetic allows the MAC operations operated on non-zeros bits of data. This will guarantee the energy efficiency as well as the accuracy of the proposed computing engine. Moreover, applying scalable universal matrix multiplication algorithm (SUMMA), we avoid storing repeated data in the local storage, and data can be broadcasted to corresponding PEs. With the proposed simple stream interface unit (SIU), the proposed design can greatly reduce the access frequency of operands (data or weights) being read/written from/to the central register file (CRF), and minimize the power consumption. Simulation results reveal that the proposed SFP SUMMA array processor can achieve approximately 56.47% top-1 accuracy performance and consume only 167mW. Synthesized by TSMC 90 nm CMOS technology, the proposed SFP SUMMA DIP achieves 0.45 TOPs/W. On the contrary, performing the same work load of the 5 convolutional layers within Alexnet, the performance of MIT Eyeriss is only 0.3 TOPs/W (@65 nm CMOS).	en_US
dc.language.iso	en_US	en_US
dc.subject	神經網路	zh_TW
dc.subject	加速器	zh_TW
dc.subject	靜態浮點數運算	zh_TW
dc.subject	陣列處理器	zh_TW
dc.subject	卷積神經網路	zh_TW
dc.subject	convolution neural network	en_US
dc.subject	accerlerator	en_US
dc.subject	static floating point	en_US
dc.subject	CNN inference	en_US
dc.subject	array processor	en_US
dc.title	適用於卷積神經網路應用之高精準度高效益靜態浮點數運算外積陣列處理器	zh_TW
dc.title	A High-accuracy and Cost-effective SFP SUMMA Array Processor for CNN Inference Application	en_US
dc.type	Thesis	en_US
dc.contributor.department	電子研究所	zh_TW
顯示於類別：	畢業論文