標題: 基於學習法的行人辨識用之於RGB-D影像
Learning-based Pedestrian Detection Applied to RGB-D Images
作者: 溫夏夢
杭學鳴
Patrisia, Sherryl Santoso
Hang, Hsueh-Ming
電機資訊國際學程
關鍵字: 行人辨識;RGB-D影像;CNNs;Pedestrian Detection;RGB-D Images;CNNs
公開日期: 2016
摘要: 在真實世界複雜的環境中, 要準確的偵測行人仍然是個具有挑戰性的課題。為了解決這個問題,我們採用R-CNN方法以萃取出可靠的特徵以及位置。整個流程一開始是由region proposal產生待測的candidates,接著使用深度學習的CNN產生可靠的特徵。除彩色影像,深度資訊對於偵測行人或物件經常是很有幫助的。因此,我們使用RGB-D資料庫並結合彩色與深度圖來進行行人偵測。 在此論文中,我們使用深度編碼方法將原始深度圖轉成HHA格式,使其格式可以被CNN處理。HHA編碼方法包含三個部分,分別為水平視差、離地高度、重力角度。另外,我們採用具選擇能力搜尋方法來產生region proposals (object candidates)。在我們的系統中,我們使用RGB或HHA影像所產生的candidates來讓CNN學習、萃取特徵。我們發現兩種region proposal結果在我們的行人偵測問題中,會有著明顯的不同效果,其中HHA可以有較好的region proposal結果。 接下來,我們可以結合由RGB及HHA資料偵測所產生的輸出。資訊融合處理可以被安排在系統流程中不同的位置。我們可以個別處理RGB以及HHA資料,並檢查個別的機率,最後做二選一。除此之外,我們也可以結合兩者的特徵空間。為了結合兩個資料形式的特徵,我們使用SVM來做最後判斷。除此之外,我們運用PCA來降低資料融合的冗餘資料。我們設計了pre-PCA跟post-PCA技巧。Pre-PCA是在特徵融合前做PCA,而post-PCA則在特徵融合後才做PCA。最終實驗顯示將HHA選擇性搜尋所產生的bounding boxes用到RGB及HHA影像上,可以產生更可靠的region proposal。接著,PCA可以減少特徵數量,並只保留下重要的特徵。最後,融合RGB及HHA region proposals的特徵向量並結合pre-PCA方法可以產生良好的行人偵測率、及最低誤判率。
In the complicated environments of real world, accurate pedestrian detection is still a challenging topic. To overcome this issue, we adopt the R-CNN method, which has the ability on extracting robust features and localization as well. The process starts with region proposals (Selective Search) for generating detection candidates, followed by deep learning (CNNs) to produce robust features. Furthermore, the depth information is often helpful in detecting pedestrians and/or objects. We thus use the RGB-D dataset and combine both color picture and depth map information for pedestrian detection. In this thesis, we use a depth-encoding method to convert the original depth map to the HHA format so that it can be processed by CNNs. The HHA encoding method includes three channels: horizontal disparity, height above ground, and angle with gravity. Another technique we adopt is the selective search method that generates region proposals (object candidates). We could use either RGB or HHA images to generate object candidates. In our system, we use CNNs to learn and extract features based on either the RGB or HHA generated candidates. We found that they (two region proposals) make significant difference in our pedestrian detection problem. The HHA proposals lead to much better results. One step further, we could combine the outputs produced by the RGB data and the HHA data in the detection. The information fusion process can be inserted at different points in the system. We can process each data source (RGB and HHA) separate to examine their individual decision (probability) to make the final binary decision. Also, we can combine the feature spaces. In order to combine the features of two sources, we also add an SVM process to make the final decision. Furthermore, we also use PCA to reduce redundant data in data fusion. We design two types of techniques: pre-PCA and post-PCA. The pre-PCA technique applies before features fusion, while post-PCA is after features fusion. The final experiments indicate that generating bounding boxes from HHA Selective Search, then applied to RGB and HHA Images can produce more robust region proposals. Next, PCA can reduce unnecessary features also left only the important features. Finally, by fusing RGB and HHA region proposals features combining with pre-PCA can produce good Pedestrian Detection Rate and lowest False Positive rate.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070356153
http://hdl.handle.net/11536/139039
Appears in Collections:Thesis