標題: 針對旋積類神經網路之尺度不變性進行改善
On Improvement of CNN’s Scale Invariance
作者: 謝璿羽
Sie Syuan-Yu
彭文孝
Peng Wen-Hsiao
多媒體工程研究所
關鍵字: 視覺化搜索;旋積類神經網路;抵抗尺度變化;visual search;convolutional networks;scale variance
公開日期: 2015
摘要: 深度學習為基礎的旋積類神經網路最近廣泛地被應用在各種不同影像辨識等工作上,主要是因為旋積類神經網路能在影像上擁有卓越的能力去解析更高層級的特徵值,如物件及零件的層級。然而,其效能被發現容易受到影像轉換的影響,包含位移、尺度和旋轉。為了改善其尺度轉換不變性,本篇論文採取三方面的改善措施:從旋積類神經網路架構、訓練階段及測試階段等。具體來說,藉由來自尺度不變特徵轉換的靈感,我們提出了不同尺寸的核心引入旋積類神經網路的流程中,希望能擷取更多有意義且不同尺度的特徵值。在訓練階段,我們擴展了不同尺度的訓練資料,目的是讓旋積類神經網路自己適應這不同尺度的特徵值。在測試階段,我們透過取樣多個剪裁和尺度變化轉換的測試影像,藉由統整其輸出去提升精確度。眾多的實驗已經進行與分析,就物件辨識工作來說,從各方面加強和其組合皆可帶來好處。
Deep-learning-based convolutional neural network (CNN) has recently been applied widely to various image recognition tasks due to its superior ability to extract higher level features, such as objects or parts, from an image. Its performance however was found to be susceptible to image transformations, including translation, scaling, and rotation. To improve its scale invariance, this thesis takes a three-pronged approach, with aspects addressed including the structure of CNN, the training process, and the testing process. Specifically, inspired by the design of SIFT, we introduce filters of different sizes into the CNN pipeline, hoping to capture more meaningful features that may be variable in size. In the training process, we augment the training data with images of different scales, so that the weight parameters of the CNN can adapt themselves to variable-size features. During the testing stage, we pass multiple replicas of the query image transformed with cropping and/or scaling through the CNN and pool together their outputs for a more accurate prediction. Extensive experiments have been carried out to analyze the benefit from each of these enhancements and their combined effects in terms of accuracy in recognition tasks.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070256625
http://hdl.handle.net/11536/127280
Appears in Collections:Thesis