標題: 基於深度學習在物聯網的應用之即時手語辨識
Deep Learning Based Sign Language Recognition on Real-Time Augmented Reality for IoT
作者: 楊柏漢
林寶樹
Yang, Po-han
Lin, Bao-Shuh
多媒體工程研究所
關鍵字: 深度學習;影像辨識;手語辨識;Deep learning;Sign language;hand pose classifiction
公開日期: 2017
摘要: 近年來深度學習 (Deep Learning)的議題在短時間內迅速竄紅,深度學習解決了許多以前解決不了的問題,從熱門的議題–棋局對弈中Google開發的的Alpha Go到影像處理範疇中的Facebook人臉辨識、Apple的Siri語音辨識等等,都是利用深度學習當主軸發展出來的新興產品,漸漸的深度學習變成一個在解決問題中不可或缺的角色。當中,影像處理的部分有很大的突破讓我們值得來加以探討。 另外,多數的資訊研究顯示我們即將進入大數據的時代,也就是我們在收集資料上可以有更大、更寬廣且更多元的面向。這對於我們深度學習的模型有絕對優勢的幫助,讓我們可以解決更為細緻的圖像辨別,在此論文當中使用了單顆一般的視訊網路鏡頭 (最低720p畫質)來辨識手語,使用門檻較低的設備更有助於推廣與發展,這是在基於深度學習下可以達到的目標–令人滿意的準確度以及使用相對低階的硬體設備。然而手語為連續性的動作,需要更多樣的手勢資料才能準確的辨識出手語,因此深度學習成為這個問題的主軸。因此在論文中使用了強大的深度學習模型,例如:VGG16 (Visual Geometry Group 16) ,NIN (Network in Newtork) 來訓練我們的手語辨識。 在實驗中,我們蒐集大量的資料來訓練模型。測試結果顯示,我們在辨識單個手語時可以有9成3以上的準確度,在14種手語集合上有7成3的準確度,並且能在一般鏡頭上即時且準確的判斷。此篇論文也討論了2個深度學習的模型的訓練結果與比較,並在前處理上做更好的優化讓資料能更符合深度學習的模型。此外在手語辨識上的應用實作包含社群網路的訊息傳遞與裝置的控制與操作,讓手語辨識能有更多的發展與應用。
Deep Learning has been discussed in these years, which gives us to solve problem from game of chess, computer vision, artificial intelligence. Deep enough neural network let problem have a wonder effect on solve computer vision problem. From the earliest problem, like identify object, to the state of the art practice, which is the classify face or hand pose. Therefore thus, most of difficult computer vision problem can be solve in a better result in deep learning. In identify a sign language what does it said in a video. We would face need the high accuracy, high velocity and high variety result in these problem. And deep learning can approach classify image has high accuracy in GPU computing resources, which meet the requirement of video streaming or webcam streaming. However, most of deep learning based image classification about 'sign language' is not enough, This issue need to know which deep learning model can identify from a hand pose to a sign language. In order to address the issue, we proposed use the many popular deep learning model to solve streaming video of sign language, even use webcam in real-time. It can more easier let use sign language people explain what he/she said without one more people to explain it. In the experiment, the training data are collected many sign language video and take popular deep learning model as reference which are basic 4 layers neural network, NIN (network in network). Evaluation shows we take these deep learning in more meet our requirements can increase accuracy in low equipment even avoid overfitting problem. It is means, we can more convenient to know what he/she said in sign language. Otherwise, we can communicate with people by social network application. In the same time, we also communicate with device to transmit message to control the device, build a Internet of thing environment. The contribution of this thesis is we proposed a more better method giving it a high accuracy in low equipment real-time and build full system for sign language recognition.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070456648
http://hdl.handle.net/11536/141581
Appears in Collections:Thesis