多實例類神經網路影像檢索之研究

標題:	多實例類神經網路影像檢索之研究 A Multiple-Instance Neural Networks for Content-based Image Retrieval
作者:	莊舜清 Chuang, Shun-Chin 傅心家 Fu, Hsin-Chia 資訊科學與工程研究所
關鍵字:	多實例學習;基於影像內容搜尋;自助法;類神經網路;Multiple-Instance learning;Content-based Image Retrieval;Bootstrap method;Neural Networks
公開日期:	2010
摘要:	本論文提出一個新的基於影像內容搜尋之多實例類神經網路系統。由於在影像處理中，要將一張影像自動且正確的切割成區塊，仍是一個相當困難的議題。為了可以避免需要精準的影像切割技術，而依舊能夠呈現影像的內容，本系統將影像內容搜尋的問題轉換成多實例學習。為了解決基於影像搜尋的多實例問題，我們提出一個統計式的多實例類神經網路（MINN），使得欲查詢影像的真實分佈能更精確的方法被趨近。同時提出兩種新的影像表示方法：(1).加權式色彩直方圖（Weighted Color Histogram）與 (2).加權式紋理直方圖（Weighted Texture Histogram）。在所提出的多實例類神經網路影像內容搜尋系統中，所使用的特徵是先由使用者在欲查詢影像上點選若干重要的位置，視為其針對該影像所感興趣的區域（Region of Interesting，簡稱 R.O.I.），以這些點視為代表使用者概念的實例（instance）。以所點選的實例為基準，再利用一個稱為似高斯之遮照（Gaussian-like mask），在LAB色彩空間下，與上述兩種直方圖做加權運算，然後再以混合高斯函數來趨近化。以趨近化後的參數當作影像的特徵，輸入到多實例類神經網路影像內容搜尋系統中，並且提出一個可以測量兩分佈之間相似度的方法，去訓練出使用者所想要的概念影像。在以高斯混合函數趨近直方圖的過程中，每一個類別中，如何決定其高斯混合模式中的適合的群數，是一個重要的議題，也就是模式選擇（Model Selection）的問題。而當訓練資料量極少的情況下，模式選擇的問題將變的更趨複雜。本論文中利用加權式拔靴法（Weighted Bootstrap），嘗試以經驗為依據（Empirical）的方法來幫助做更正確的做模式選擇。最後實作了一個 MINN 的雛形系統，並參考 IRM 及 UFM 中之實驗作法，挑選 COREL gallery 中十個類別，每一類 100 張影像，並以 MINN 架構設計兩種實驗：(1).系統自動的提供回饋機制以及 (2).真實情況使用者的回饋意見，並顯示其結果。實驗顯示 MINN 系統確實能學習到使用者想要的視覺概念影像。 In this dissertation, we proposed a novel Multiple-Instance Neural Networks (MINN) image content-based retrieval system. Due to the segmenting an image into regions automatically is still a di±cult task in image processing research, the proposed system can reduce the image retrieval problem to the multiple-instance problem in order to represent the content of an image without precisely image segmentation. To tackle the multiple-instance based image retrieval problem, a statistical based Multiple-Instance Neural Networks (MINN) is proposed to approximate the true distribution of the query images in a more precise way than the previous approaches. Two novel image repre- sentation methods are proposed : (1) the Weighted Color Histogram and (2) the Weighted Texture Histogram. Features used in MINN image content-based retrieval system are the the parameters of mixture density functions which approximate the two histograms of images in the Lab color space in each instance sampled by a Gaussian-like mask, and the measurement of a distance between two distributions is proposed. In the process of approximating the histograms, how to determine the proper number of the clusters in the mixture Gaussian model of each class, that is, a problem about the model selection, is still an important issue. The weighted bootstrap method is proposed to make the selection more correctly. Some experiments for the MINN are exercised and results shown that the proposed MINN is successful to learn the user's visual concept more precisely. A prototype of the MINN based image content retrieval system was implemented and the experimental results shown that the system can retrieve the user desired images successfully.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT078917813 http://hdl.handle.net/11536/40230
Appears in Collections:	Thesis

Files in This Item:

781301.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.