標題: 基於CNN與SIFT之多查詢影像檢索
Multi-Query Image Retrieval using CNN and SIFT Techniques
作者: 黃暄
杭學鳴
Huang, Shiuan
Hang, Hsueh-Ming
電子工程學系 電子研究所
關鍵字: 卷積神經網路;尺度不變特徵轉換;多查詢;影像檢索;內容檢索;連體三胞胎網路;CNN;SIFT;multi-query;image retrieval;content-based retrieval;siamese-triplet network
公開日期: 2016
摘要: 隨著影像數量的劇增,在大型資料庫中的影像內容檢索便成為影像處理領域中重要的工具,雖然已有很多相關研究,但對於進階影像搜尋依舊很有挑戰性,舉例來說,若我們要利用不同於資料庫中的照相角度的照片,來查詢一個特定建築物。此外,若使用者可以提供額外的照片當成第二或第三個查詢目標,我們要如何彙整這些不同查詢目標的諮詢呢?因此,我們提出一個多查詢融合演算法以達到更高的準確率。 在這篇論文中,我們測試了兩種不同種類的特徵值來做為檢索使用。我們採用尺度不變特徵轉換(SIFT)為低階特徵值和卷積神經網路(CNN)為高階特徵值來做檢索。SIFT特徵值的方面,是利用詞袋模型(Bag-of-Word)搭配詞頻-逆向文件頻率(TF-IDF)檢索方式來處理。CNN的模型是從AlexNet修改而成並且延伸成連體三胞胎網路以便適用於影像檢索之應用,整體網路的權重大小是先使用ImageNet預訓練再使用特定建築物資料庫來做微調。我們將CNN當成特徵值截取器而非分類器,因此損失函數是計算測試影像和相似與不相似影像的相似度。 我們也提出了各種不同階段的資料結合方法。第一種是合併CNN中的第六層特徵值和第七層特徵值,第二種是結合SIFT特徵值和CNN特徵值所產生的資訊,第三種是混合多查詢影像所提供的資訊,如果資料適合的話,我們都會實踐前期融合和後期融和這兩種概念方法。我們提出的單一查詢影像檢索可以超越大多數目前最先進的研究結果,多查詢影像檢索的方法更可以達到較高的檢索準確率。
Due to the rapid growth of image number, the content-based image retrieval for a large database becomes an essential tool in image processing. Although there are many published studies on this topic, it is still a challenge to do an advanced search, for example, retrieving a specific building using a view different from the photographing angles in the database. In addition, if the user can provide additional images as the second or the third queries, how do we combine the information provided by these multiple queries? Thus, we propose a multi-query fusion method to achieve a better accuracy. In this study, we test two different types of features designed for retrieval purpose. We adopt the Scale-Invariant Feature Transform (SIFT) feature as the low-level feature and the Convolutional Neural Network (CNN) feature as the high-level feature for retrieval. In using the SIFT features, the Bag-of-Word is implemented using the Term Frequency–Inverse Document Frequency (TF–IDF) retrieval algorithm. The AlexNet is adopted as our CNN model and it is modified to the Siamese-Triplet Network to match the image retrieval purpose. The Network weights are pre-trained by ImageNet and are fine-tuned using specific landmark datasets in retrieval. We use the CNN as the feature extractor instead of the image classifier. The loss function calculates the similarity between the query and the similar images or dissimilar images. Several levels of data fusion methods are proposed. The first one is combining the features of the 6th layer features and the 7th layer features derived from CNN. The second one is combine the information provided by the SIFT features and the CNN features. The third one is combining the information provided by multiple queries. When appropriate, we try both early fusion concept and the late fusion concept. Our best proposed method can exceed most of the state-of-the-art retrieval methods for a single query. The multi-query retrieval can further increase the retrieval accuracy.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070350221
http://hdl.handle.net/11536/138734
Appears in Collections:Thesis