標題: | 基於遞歸神經網路使用骨架資訊之連續動態手勢辨識 Continuous Dynamic Hand Gesture Recognition based on Recurrent Neural Networks over Skeleton Information |
作者: | 張繼宗 王聖智 簡鳳村 Chang, Chi-Tsung Wang, Sheng-Jyh Chien, Feng-Tsun 電子研究所 |
關鍵字: | 手勢辨識;深度影像;機器學習;深度學習;手部骨架;Hand gesture recognition;Depth image;Machine learning;Deep learning;Hand skeleton |
公開日期: | 2017 |
摘要: | 手勢辨識為一人機互動之熱門領域。其可分為靜態與動態手勢辨識。而動態手勢辨識之中,又可分為單一手勢與連續手勢辨識。其中連續手勢辨識的研究多半會假設手勢與手勢間有明顯分割動作,並以此做為將手勢分離的依據。
本研究嘗試使用手部的骨架資訊進行連續動態手勢的辨識。在這過程中,我們不假設任何分割動作來分離手勢,而是直接進行判斷。我們使用的攝影機為Intel RealSense RGBD攝影機,他有提供可捕捉手部骨架的程式庫。由於沒有資料庫符合我們所有的要求,我們使用DHG (Dynamic Hang Gesture) 14/28資料庫,其包含14個手勢,使用與我們相同的攝影機錄製,每個資料均只包含一個手勢。此外我們還有以該手勢庫的定義手勢額外錄製連續手勢的測試資料庫做為評比。
我們嘗試滑動視窗法以及CTC (Connectionist Temporal Classification) 訓練法作為兩種方法解決這問題。然而滑動視窗法在防止錯誤偵測上的表現遠不如CTC法。而CTC法必須依靠將原始DHG資料集改編成連續手勢資料集才能訓練起來。此外我們還嘗試在模型的架構上進行改良,並利用額外錄製的少量連續手勢資料集調整模型的訓練,最終使其在DHG與我們錄製的資料集上都能有不錯的表現。 Hand gesture recognition is a hot area in human-computer interaction. It can be divided into static and dynamic gesture recognition. Dynamic gesture recognition can be further divided into segmented gesture recognition and continuous gesture recognition. Existing research studies about continuous gesture recognition usually assume that gestures can be clearly divided by some obvious actions, which serves as the basis for separating gestures. In this thesis, we attempt to use the skeleton information of the hand to do continuous dynamic gestures recognition. In this work, we do not assume any split action to separate the gestures, but directly do the task. We use an Intel RealSense RGBD camera, which is equipped with a library that can retrieve the hand skeleton information. Since there is no dataset to meet all of our requirements, we use the Dynamic Hang Gesture 14/28 dataset, which contains 14 gesture classes. This dataset was recorded with the same camera as ours. However, each sample contains only one gesture. In addition, we record a testing dataset to evaluate the performance using the gestures defined in the DHG dataset. We try to use the sliding window method and the Connectionist Temporal Classification (CTC) training method to solve this problem. However, the sliding window method performs far worse than the CTC method in terms of preventing from false detection, while the CTC method must rely on modifying the original DHG data set into a continuous gesture data set for a successful training. In addition, we also try to improve the structure of the model. We also use a small amount of additional recording continuous gesture data set to fine-tune the model. Finally, we achieve a good performance on both the DHG dataset and our testing dataset. |
URI: | http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070450233 http://hdl.handle.net/11536/142682 |
Appears in Collections: | Thesis |