標題: | 基於語意的圖像檢索 Semantics-based Image Retrieval |
作者: | 吳韋良 李嘉晃 劉建良 莊仁輝 Wu, Wei-Liang 資訊科學與工程研究所 |
關鍵字: | 基於內容的圖像檢索;類神經網路;深度學習;卷積類神經網路;基於區域的卷積類神經網路;潛在語意空間;Content-based Image Retrieval;Neural Network;Deep Learning;Convolution Neural Network;Region-based Convolutional Network;Latent Semantic Space |
公開日期: | 2016 |
摘要: | 圖像擷取是多媒體應用中一項重要的技術,而以圖找圖也是目前常見的擷取方式,使用者可以給予一張圖像作為查詢,讓系統找出相似的圖像。目前常見的以圖找圖主要是透過抽取圖像特徵的圖片檢索研究,查詢圖像跟資料庫內的圖像使用擷取出來的圖像特徵做比對。該模式依賴圖像所擷取出來的特徵,因此無法擷取出語意相關的圖像。目前很多照片網站允許使用者針對上傳的照片做標籤或提供敘述說明,本研究同時利用圖像內容及圖像本身附帶的敘述說明,使用深度學習(Deep Learning)的相關技術,提出一個以語意為基礎的以圖找圖架構;在輸入圖片進行圖像檢索時,將所輸入圖像抽取出的可能物件類別詞,轉換到語意空間的詞向量;同時將欲檢索資料庫中圖片的敘述文字也投影到同一潛在語意空間中,讓查詢圖像與資料庫圖片描述在共同語意空間進行以語意為基礎的圖像檢索。本論文利用在Flickr上的圖片和敘述文字進行實驗,以圖像檢索找出的平均不相關圖片數量作為評分依據,實驗結果證明在只輸入圖片情況下,對於Flickr上的圖像庫及附加敘述文字進行圖像檢索,本論文提出的方法較其他比較方法更能找出可接受的圖片 Image search is an important technique in multimedia applications, in which image retrieval is a common technology for image search. Given a query image, the goal is to retrieve relevant images from an image database. Most previous research studies rely on important features extracted from images to calculate the similarity of two images. One of the drawbacks of this approach is that it focuses on the image-specific features without considering semantics of the images. Therefore, the images that are semantically related to query images but highly differ in image features will not be the candidates of the retrieval. Additionally, many photo websites allow users to provide descriptions or tags for the photos they uploaded, inspiring us to use the image itself and its description to propose a semantics-based image retrieval framework by using machine learning techniques. The key idea behind the proposed method is to extract important objects from the query image, and classified the extracted objects as predefined labels for this query image. Then, we project the labels and the descriptions in the data set to the same latent space by using deep neural networks, and calculate semantic similarity in the latent space. This thesis conducts experiments on Flickr data set and evaluates the results with the average irrelevant image number of the searching results. The experimental results indicate that when only using an image as query, the retrieved results are much acceptable than other methods' results. |
URI: | http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070356161 http://hdl.handle.net/11536/139767 |
Appears in Collections: | Thesis |