標題: 應用於類神經網路之頻寬降低方法
Neural Network Systems with Bandwidth Reduction Method
作者: 喻婉茹
李鎮宜
Yu, Wan-Ju
Lee, Chen-Yi
電子研究所
關鍵字: 類神經網路;低頻寬;Neural network;low bandwidth
公開日期: 2016
摘要: 近年來,深度學習演算法廣泛的被運用在許多系統上以提供智慧處理能力,例如資料辨識和物件偵測。為了在應用上達到高準確率,深度學習模型的架構變得越來越複雜,因此必須使用外部記憶體來妥善儲存深度學習模型中的大量資料。然而,外部記憶體的資料讀取時間反而會限制系統效能。因此在論文中,我們設計並實現一個低頻寬類神經網路處理器。藉由論文中所提出之記憶體頻寬降低方法和係數對應方法,可以實現一個外部記憶體管理元件來讀取類神經網路模型並有效降低記憶體頻寬。此外,時脈閘控技術被應用在提出的資料運算單元及類神經節點暫存元件來降低功率消耗。論文中設計完成之類神經網路處理器透過Xilinx Virtex-7 系列之現場可程式化閘陣列,實現整合了32個類神經網路運算元來進行平行運算,並且可達到3.4G synapse/s和7.2 nJ/synapse之運算效能。在125MHz 工作頻率下,本系統使用了192.2k個查找表﹑202.5k個正反器和514個記憶體區塊。
Recently, the deep learning algorithms are widely applied in many systems to provide intelligent capabilities, such as data classification and object detection. In order to achieve high inference accuracy for the application, the structures of the deep learning models are more and more complexity. Due to the increasing size of the pre-trained models, an external memory is applied to store all data of the models. However, the long data accessing time from external memory dominates the performance of system. In this thesis, a low memory bandwidth neural network (NN) engine is designed for data classification applications. By exploiting the proposed memory bandwidth reduction and coefficient mapping method, the external memory management is implemented to efficiently access the reduced NN model to reduce the memory bandwidth. The clock gating mechanism is applied in the proposed data processing unit (DPU) and the neuron buffer module to reduce power consumption. Implemented in a Xilinx Virtex-7 FPGA, 32 proposed NN processors are integrated in the system for parallel computation to achieve 3.4G synapse/s throughput performance and 7.2 nJ/synapse energy efficiency in the FPGA platform. Operating at 125 MHz, the proposed system prototype platform occupies 192.2k LUTs, 202.5k FF, and 514 block memory blocks.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070350263
http://hdl.handle.net/11536/139946
顯示於類別:畢業論文