标题: 基于机器学习之浮点加速器设计
A Floating Point Accelerator for Machine Learning Application
作者: 洪子训
Hung, Tzu-Hsun
李镇宜
Lee, Chen-Yi
电子工程学系 电子研究所
关键字: 机器学习;浮点;ZedBoard;machine learning;floating point;ZedBoard
公开日期: 2013
摘要: 近年来,机器学习已经被广泛地应用在不同领域。由于机器学习演算法需要大量的算术运算,算术运算的加速可以提升总消耗时间。为了使浮点运算器能使用在不同的应用,我们考量了可以在计算时间以及精确度作取舍的浮点运算器。这篇论文的浮点运算器使用了数位座标旋转运算器(CORDIC)演算法,并且有四种运算,分别是自然指数、自然对数、平方根,以及除法。使用UMC 90奈米的制程,合成结果可以达到250MHz。平方根以及除法运算有使用IBM的初版IEEE 754R的二进位模型验证,所使用的近位模式是取最靠近0的,如果有两个就取偶数。这篇论文的浮点运算器被用来在ZedBoard平台上验证,此平台有双核心的ARM CortexTM-A9处理器,并且有28奈米的Xilinx FPGA。ZedBoard效能验证平台使用的环境ARM CortexTM-A9处理器运行在666.67MHz,浮点运算器则在38.8MHz。
Recently, machine learning has been widely used in different area. Since machine learning needs large amounts arithmetic operations, accelerating arithmetic operation can speed up total elapsed time. Unlike traditional data access from memory, we want to integrate machine engine to storage directly since big data era is coming. In order to deal with large amounts of data, floating point format for more precision is applied. To fit in different applications in future, the input parameter making trade-off between precision and latency is provided to user. For some algorithms, we could lower precision and accelerate the arithmetic operation with almost the same results. The FPU is implemented using Coordinate Rotation Digital Computer (CORDIC) algorithm and has four operations, exponential, natural logarithm, square root, division. Using UMC 90nm Process, the FPU can achieve 250MHz in synthesis results. The square root and division operations of FPU are verified with IBM preliminary version of IEEE 754R binary model with rounding to the nearest, tying to even. The FPU is also verified on ZedBoard platform, which contains dual-core ARM CortexTM-A9 based processing system and 28nm Xilinx programmable logic. The performance between ARM CortexTM-A9 at 666.67MHz and custom FPU at 38.8MHz is also evaluated on ZedBoard.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070050221
http://hdl.handle.net/11536/75986
显示于类别:Thesis