標題: | 即時的視覺特徵運算之SIFT快速演算法暨硬體設計 Fast SIFT Algorithm And Its Design For Real-Time Visual Feature Extraction |
作者: | 邱亮齊 Chiu, Liang-Chi 張添烜 Chang, Tian-Sheuan 電子研究所 |
關鍵字: | SIFT;即時運算;視覺特徵;積分影像;SIFT;Real time computation;Feature extraction;Integral image |
公開日期: | 2011 |
摘要: | SIFT視覺特徵演算法廣泛應用於電腦視覺的物體辨識技術上。然而此演算法有相當程度的禎級計算延遲,且需要經過複雜的計算與使用大量的記憶體才能不斷的對畫面疊帶高斯模糊計算,並利用不同模糊程度的畫面差值尋找特徵點。
我們提出使用積分影像之階層平行SIFT演算法,以其平行化架構符合即時運算應用的需求。首先為了避免禎級計算延遲,我們採用階層平行重組方塊濾波器取代疊代的高斯模糊計算,並進一步使用積分影像法特性重複利用子遮罩加總來簡化計算。另外在關鍵點檢測方面,我們採用簡單的低亮度檢測取代複雜的低對比度分析。在硬體設計上,配合階層平行演算法我們提出即時同步計算及特徵點尋找之排程與架構,不需儲存整張畫面只需儲存少部分畫面資訊;並提出一個泛用運算單元取代複雜的除法與倒數開根號元件降低硬體成本並只需相當於精確度的運算週期數。
與原始的演算法相比,我們成功的降低90 %的軟體計算量與95 %的記憶體使用量。硬體實現方面,UMC 90-nm製程下使用了58K Gate Count。100MHz的運作時脈下,對每秒30張的VGA畫面提供每張畫面約六千個特徵點;並對每秒30張的1920 × 1080畫面也可提供約兩千個特徵點。 Real time visual feature extraction with SIFT (shift invariant feature transform) are widely used in computer vision for object recognition. However, this algorithm suffers from long latency, heavy computation and high memory storage because of its frame level computation with iterated Gaussian blur operations on images and the frame difference operations on blurred images for feature extraction. To solve above problem, this thesis proposed a layer parallel SIFT (LPSIFT) with integral image and its parallel hardware design for real time application needs. First, to avoid the long latency due to the frame level computation, we adopted the layer parallel restructured box kernel to replace iterated Gaussian blur operations. The computation of box kernel was further simplified by the integral image approach with reuse of sub-kernel sum. For the keypoint localization, we simplify the complex low contrast analysis to be a low brightness test. For hardware design, we adopted the on-the-fly feature extraction flow so that only partial temporal results have to be stored. Furthermore, the costly inverse square root and divider was implemented by a low cost universal operation unit with precision equivalent cycles (PEC) to reduce the gate count. Compared with the original SIFT algorithm, the proposed algorithm reduced the computational amount by 90 % and memory usage by 95 %. The final implementation used 58K gate count for UMC 90-nm CMOS technology, and offered 6000 feature points per frame for VGA size image at 30 frames per second and approximately 2000 feature points per frame for 1920 × 1080 image at 30 frames per second at the clock rate of 100 MHz. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT079811602 http://hdl.handle.net/11536/46768 |
Appears in Collections: | Thesis |