標題: 基於機器學習之浮點加速器設計
A Floating Point Accelerator for Machine Learning Application
作者: 洪子訓
Hung, Tzu-Hsun
李鎮宜
Lee, Chen-Yi
電子工程學系 電子研究所
關鍵字: 機器學習;浮點;ZedBoard;machine learning;floating point;ZedBoard
公開日期: 2013
摘要: 近年來,機器學習已經被廣泛地應用在不同領域。由於機器學習演算法需要大量的算術運算,算術運算的加速可以提升總消耗時間。為了使浮點運算器能使用在不同的應用,我們考量了可以在計算時間以及精確度作取捨的浮點運算器。這篇論文的浮點運算器使用了數位座標旋轉運算器(CORDIC)演算法,並且有四種運算,分別是自然指數、自然對數、平方根,以及除法。使用UMC 90奈米的製程,合成結果可以達到250MHz。平方根以及除法運算有使用IBM的初版IEEE 754R的二進位模型驗證,所使用的近位模式是取最靠近0的,如果有兩個就取偶數。這篇論文的浮點運算器被用來在ZedBoard平台上驗證,此平台有雙核心的ARM CortexTM-A9處理器,並且有28奈米的Xilinx FPGA。ZedBoard效能驗證平台使用的環境ARM CortexTM-A9處理器運行在666.67MHz,浮點運算器則在38.8MHz。
Recently, machine learning has been widely used in different area. Since machine learning needs large amounts arithmetic operations, accelerating arithmetic operation can speed up total elapsed time. Unlike traditional data access from memory, we want to integrate machine engine to storage directly since big data era is coming. In order to deal with large amounts of data, floating point format for more precision is applied. To fit in different applications in future, the input parameter making trade-off between precision and latency is provided to user. For some algorithms, we could lower precision and accelerate the arithmetic operation with almost the same results. The FPU is implemented using Coordinate Rotation Digital Computer (CORDIC) algorithm and has four operations, exponential, natural logarithm, square root, division. Using UMC 90nm Process, the FPU can achieve 250MHz in synthesis results. The square root and division operations of FPU are verified with IBM preliminary version of IEEE 754R binary model with rounding to the nearest, tying to even. The FPU is also verified on ZedBoard platform, which contains dual-core ARM CortexTM-A9 based processing system and 28nm Xilinx programmable logic. The performance between ARM CortexTM-A9 at 666.67MHz and custom FPU at 38.8MHz is also evaluated on ZedBoard.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070050221
http://hdl.handle.net/11536/75986
顯示於類別:畢業論文