使用混合卷積神經網路於影片分類之研究

標題:	使用混合卷積神經網路於影片分類之研究 A Study on Video Classification Using Mixture of Convolutional Neural Networks
作者:	孫曼津王聖智簡鳳村 Sun, Man-Chin Wang, Sheng-Jyh Chien, Feng-Tsun 電子工程學系電子研究所
關鍵字:	深度學習;影片分類;神經網路;卷積神經網路;Deep Learning;Video classification;Neural Network;Convolutional Neural Network
公開日期:	2016
摘要:	在網路蓬勃發展下，多媒體資料迅速的增多，影片分類成為一個重要的問題。而「卷積神經網路 (Convolutional Neural Network, CNN)」是當前最佳的影像分類架構，同時也被用在許多不同的應用。在這個論文中，將提出三個用來影片分類的卷積神經網路：「多流卷積神經網路 (Multi-stream CNN)」有多個不同時間解析度的卷積網路，用來學習不同解析度的時間空間特徵，而其中「差異卷積神經網路 (Difference stream)」，使輸入轉為時間上的相減，其使整個架構可以學到更多時間上的特徵。第二，「運動卷積神經網路 (Motion CNN)」使用動作估計的方法來學習影片中物體的運動狀態。最後，「混合卷積神經網路 (Mixture of CNNs)」包含前述兩個卷積神經網路，實驗證實能達到更好的影片分類精準度。這三個架構都僅使用來自於其他文獻中訓練過的卷積網路的高層次的特徵(high-level feature)作為輸入，不但可以以供準確率進行影片分類，同時也因為維度較低，能大量降低運算需求。 Multimedia has been explosively growing since the Internet development, videos classifica-tion problem becomes an important task. Convolutional Neural Network (CNN) is state-of-the-art architecture to image classification, and has been applied to a wide variety of tasks. In this work, three CNNs performing video classification are proposed. First, the “Multi-stream CNNs” contains a multi-stream architecture in which each stream contains different kernel size for learning spatiotemporal data of different resolution. Besides, the “difference” stream allows the system to learn more information from the temporal differ-ence of the input. Second, the “Motion CNN” uses motion estimation information to learn object movements in video. Finally, the “Mixture of Convolution Neural Networks, Mixture CNNs” is developed to combine the former two approaches for a superior video classifier. All these models only take feature maps of video frames from image-trained CNN as input, which have much lower input dimension, yet can still have effective classification accuracy.
URI:	http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070150127 http://hdl.handle.net/11536/143135
Appears in Collections:	Thesis