基於卷積類神經網路之六自由度物體姿態估測演算法

標題:	基於卷積類神經網路之六自由度物體姿態估測演算法 Six-DOF Pose Estimation of Object Using Convolutional Neural Network Algorithm
作者:	鄭祐晨胡竹生 Zheng,Yo-Chen 電控工程研究所
關鍵字:	姿態估測;電腦視覺;深度攝影機;點雲圖;卷積類神經網路;pose estimation;computer vision;depth sensor;point cloud;convolutional neural network
公開日期:	2017
摘要:	姿態估測演算法是要估測一個物體在三維空間中的坐標(x、y、z)與旋轉量(roll、pitch、yaw),這六個參數合稱為姿態。本論文所探討的舊有姿態估測演算法,是基於深度點雲圖資料做姿態估測,其演算法流程大致上分為三個步驟:一、將點雲圖中的各個物體分割(segmentation);二、得到物體粗估姿態,作為迭代最近點(Iterative Closest Point, ICP)初始值; 三、由 ICP 收斂到精確姿態。經過驗證與分析,此演算法存在一些問題: 由於輸入資料是深度點雲圖,當深度變化太劇烈的話,在第一步驟分割時,容易將一個物體誤判成兩個物體,造成後續步驟判斷錯誤;在第二步驟得到粗估姿態時,舊有的演算法是藉由相似特徵點的多寡,來得到粗估姿態,當作 ICP 初始值,不過因為上述特徵點是人工制定(handcrafted ) 的,對於一些更抽象的特徵,其辨識準確度仍有改善空間。為了改善上述兩個問題,本論文提出基於卷積類神經網路的兩個架構,Region Convolutional neural network(RCNN)與 GoogLeNet,再搭配額外的 RGB 影像資料,來取代上述的第一與第二步驟。另外訓練卷積類神經網路需要大量訓練資料,本論文也提出針對 RCNN 與 GoogLeNet 產生大量合成訓練資料的方法。最後,本論文用模擬結果驗證,加入卷積類神經網路後的姿態估測演算法,其姿態辨識成功率由原先的 60%~88%提升到 62%~95%,同時也在不同的雜訊強度與不同的光照亮度的測試情況下,驗證其具有一定程度的穩健性。 This thesis aims at estimating six-DoF object pose in three-dimensional space. Based on depth point cloud data, the three main steps of object pose estimation algorithm are as follows:1.separate underlying objects in point cloud data,2.choose a rough pose as theinitial guess to Iterative Closest Point(ICP),3.get refined pose by using ICP. However, we found that two disadvantages in pose estimation algorithm devised in the past. Firstly, if there is an abrupt depth gap in the point cloud, the algorithm in the past would take one object for two object, which would mess up the following steps .Secondly, pose estimation algorithm in the past chose a rough initial pose according to the number of similar features, but the feature were handcrafted; therefore it lacks grasp of further abstract feature . In order to mitigate the two disadvantages motioned above, this theses proposes replacing the first step with region convolutional neural network (RCNN) and replacing the second step with GoogleNet via the aid of additional RGB color image data. Besides, training convolutional neural network requires a large amount of data, so we also propose a procedure of synthesizing a great deal of data. In the end ,we conduct an experiment simulation , in which we show that pose estimation accuracy rose from 60%~88% to 62%~95%.And we also demonstrate the robustness of the proposed method under different lighting variation and differences noise intensity.
URI:	http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070260080 http://hdl.handle.net/11536/142425
Appears in Collections:	Thesis