標題: | 空拍影像人物偵測的資料擴增方法之研究 Research on data augmentation for people detection in aerial images |
作者: | 劉哲翰 蔡文錦 陳華總 Liu, Che-Han Tsai, Wen-Chin Chen, Hua-Tsung 多媒體工程研究所 |
關鍵字: | 無人機;深度學習;空拍影像;資料擴增;drone;deep learning;aerial image;data augmentation |
公開日期: | 2017 |
摘要: | 近年來,由於科技的不斷進步,無人機被廣泛使用在各種情況,因為無人機具有良好的移動性,不像一般監視器只能固定在特定地方,無人機的主動控制特性,有相當高的潛力應用在追蹤跟偵測上。另外由於高效能圖形處理器(GPU)的大幅進步,最近深度學習(deep learning)相關的研究相當熱門,因為只要有良好的訓練架構跟相關的訓練資料,深度學習在各種領域上可能比傳統方法效果更好更快。但深度學習是非常依賴訓練資料,在目前空拍影像的資料庫不多的情形之下,因為一般影像與空拍影像看似有所差距,只拿一般影像訓練好的權重去測試空拍影像,結果不盡理想。
本篇論文以人物為主要偵測對象,結合空拍影像與深度學習,在不用訓練空拍影像的前提下,利用一般影像去做資料擴增(data augmentation)改善偵測結果,由於一般的擴增方法不適用在空拍影像上,所以我們觀察空拍影像跟一般影像的差異後,我們提出三種擴增方法:填充影像邊界、影像旋轉、透視投影變換,使一般影像的人物能達到與空拍影像人物接近, 此外切割空拍影像再去偵測的方法也能改善偵測結果。 Owing to the advance of technology in the recent years, drones have been widely used. Unlike the traditional surveillance cameras which are usually set in fixed places, people can actively control the locations of drones during photo taking. Due to the good mobility, drones have the potential to be applied in many jobs such as tracking and detection. On the other hand, the research on deep learning technology is also getting more and more popular thanks to the great progress of high performance graphics processing unit (GPU). With good training models and sufficient training data, deep learning has made a breakthrough in solving many problems in a lot of fields. In this thesis, we use deep learning in the people detection problem for aerial images. The state-of-the-art detection model, YOLO, is used. However, the detection result is not good enough because the model is trained with general images which are quite different from aerial images. There are many available datasets, such as ImageNet, Pascal VOC, and MS COCO, that can be used for model training and testing. However, these datasets consist of general images instead of aerial images. The datasets with aerial images are still few nowadays. Due to insufficient aerial images, this thesis aims at using data augmentation technology to make general images look like aerial images so that they can be used for model training to improve the detection results. We first observe the differences between general images and aerial images, and then proposed three augmentation methods: border padding, image rotation, perspective transformation. The results show that the proposed methods work well in improving the performance in terms of recall rates and precision. Besides, the image splitting also improves the results. |
URI: | http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070456634 http://hdl.handle.net/11536/142254 |
Appears in Collections: | Thesis |