完整後設資料紀錄
DC 欄位語言
dc.contributor.author李其駿zh_TW
dc.contributor.author劉志尉zh_TW
dc.contributor.authorLi, Chi-Jiunen_US
dc.contributor.authorLiu, Chih-Weien_US
dc.date.accessioned2018-01-24T07:42:44Z-
dc.date.available2018-01-24T07:42:44Z-
dc.date.issued2017en_US
dc.identifier.urihttp://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070450242en_US
dc.identifier.urihttp://hdl.handle.net/11536/142855-
dc.description.abstract深度卷積神經網路(DCNN)需要大量的計算,我們對此提出一低複雜度且高精確度運算引擎,利用靜態浮點(Static Floating-Point, SFP)算術運算,可讓運算都操作在有效或非零位元上,以提高能量效益,並且在有限的8位資料位寬的情況下達到更高的正確率。此外,我們導入Scalable Universal Matrix Multiplication Algorithm (SUMMA)的資料排程,能有效避免重複儲存資料,而且資料可以廣播的方式傳到所需要的運算引擎。嵌入之微量型資料流介面單元儲存器,可大幅降低運算元對於暫存器組織的存取頻率,以降低能量損耗。模擬結果顯示,與MIT Eyeriss加速器相比,在Alexnet 5層卷積層的運算中,在提供相當的吞吐率(Throughput)的情況下,我們所設計的SFP SUMMA深度推論加速器可節省約40%的功率損耗(167mW vs. 278mW);利用ImageNet資料庫,我們所設計的SFP SUMMA深度推論加速器可提供約56.47%之Top-1準確率(註: GPU的Top-1準確率約為56.90%),而MIT Eyeriss加速器僅可提供約50.18%準確率。利用TSMC 90 nm CMOS製程技術下,所提出之SFP SUMMA DIP可提供0.45 TOPs/W的效能。反觀,在執行相同Alexnet 5層卷積層的運算中,MIT Eyeriss加速器僅提供約0.3 TOPs/W(@65 nm CMOS)。zh_TW
dc.description.abstractWe propose a high-accuracy and cost-effective array processor for Deep Convolution Neural Network (DCNN) inference application. The proposed Static Floating-Point (SFP) arithmetic allows the MAC operations operated on non-zeros bits of data. This will guarantee the energy efficiency as well as the accuracy of the proposed computing engine. Moreover, applying scalable universal matrix multiplication algorithm (SUMMA), we avoid storing repeated data in the local storage, and data can be broadcasted to corresponding PEs. With the proposed simple stream interface unit (SIU), the proposed design can greatly reduce the access frequency of operands (data or weights) being read/written from/to the central register file (CRF), and minimize the power consumption. Simulation results reveal that the proposed SFP SUMMA array processor can achieve approximately 56.47% top-1 accuracy performance and consume only 167mW. Synthesized by TSMC 90 nm CMOS technology, the proposed SFP SUMMA DIP achieves 0.45 TOPs/W. On the contrary, performing the same work load of the 5 convolutional layers within Alexnet, the performance of MIT Eyeriss is only 0.3 TOPs/W (@65 nm CMOS).en_US
dc.language.isoen_USen_US
dc.subject神經網路zh_TW
dc.subject加速器zh_TW
dc.subject靜態浮點數運算zh_TW
dc.subject陣列處理器zh_TW
dc.subject卷積神經網路zh_TW
dc.subjectconvolution neural networken_US
dc.subjectaccerleratoren_US
dc.subjectstatic floating pointen_US
dc.subjectCNN inferenceen_US
dc.subjectarray processoren_US
dc.title適用於卷積神經網路應用之高精準度高效益靜態浮點數運算外積陣列處理器zh_TW
dc.titleA High-accuracy and Cost-effective SFP SUMMA Array Processor for CNN Inference Applicationen_US
dc.typeThesisen_US
dc.contributor.department電子研究所zh_TW
顯示於類別:畢業論文