標題: 利用三維模型訓練類神經網路的手勢辨識技術
Hand Gesture Recognition with Synthetically Trained Deep Learning
作者: 許頌伶
蔡淳仁
Hsu, Sung-Ling
Tsai, Chun-Jen
資訊科學與工程研究所
關鍵字: 手勢辨識;類神經網路;深度學習;三維手部模型;Hand Gesture Recognition;Neural Networks;Deep Learning;3D Hand Modeling
公開日期: 2016
摘要: 本論文使用三維手部模型訓練類神經網路,再利用訓練完成的類神經網路模型對二維手勢影像進行辨識。 用於訓練的三維模型是由Blender三維繪圖軟體繪製而成,再加上手指轉動角度的限制,讓三維模型更貼近真實手掌與手指的運動方式。此外,根據使用性質的不同,設計了三種不同的手勢組合,分別將手勢分為:243種、36種、32種。243種手勢組合提供了所有手指旋轉角度的集合,可以用來更準確地估測每根手指的彎曲度;36種手勢組合是根據常見的手勢動作而設計;32種手勢組合則可以單純地用來辨識每根手指是否有彎曲。三維模型可依照不同的手勢組合產生相對應的二維影像(Blender Images)輸出,作為類神經網路的訓練資料。 類神經網路的部分是採用Caffe提供的AlexNet架構。將三維模型產生的二維影像(Blender Images)、加上少量的真實影像作為訓練數據集,藉由調整類神經網路的訓練參數與訓練數據集的大小、特性等等,讓三維手部模型訓練而成的類神經網路能夠成功辨識真實的二維手勢影像,以求降低深度學習對真實訓練資料量的要求。
In this thesis, a hand gesture recognition method, Synthetically Trained Deep Learning (STDL), is presented. STDL uses a 3D hand model to train the neural network, which is used to classify the real hand gesture images. STDL is composed of two major parts. The first is 3D hand modeling done with Blender, an open-source 3D computer graphics software. To make the 3D hand model more realistic, the motion constraints of finger joints are added. Furthermore, three different sets of hand gestures are designed for distinct purposes. The numbers of gestures in each set are 243, 36, and 32 respectively. The set with 243 gestures covers the entire range of fingers rotation angles and can be used to estimate the pose of each finger more accurately. The set with 36 gestures is designed according to common gestures. Last, the set with 32 gestures is focused on whether each finger is bent or not. For each set of hand gestures to be recognized, the Blender CAD tool is used to generate the 2D training images from the 3D hand models. The second part of STDL is the deep learning module. The neural networks used in STDL adopt AlexNet architecture provided by Caffe, a famous and popular deep learning framework. Using large amount of 2D Blender images plus few real images as the training and validation set of Caffe, the trained model can classify the real hand gesture images successfully by adjusting the neural network training parameters, the sizes of the training and validation sets, etc. STDL can reduce the demands for quantities of real image data in deep learning training substantially.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070356074
http://hdl.handle.net/11536/139377
Appears in Collections:Thesis