類神經網路影像檢索之研究---數百萬頁影像檢索系統之建構(VII)

標題:	類神經網路影像檢索之研究---數百萬頁影像檢索系統之建構(VII) The Study of Image Retrieval by Neural Networks---The Construction of a Multi-Million Pages Image Query and Retrieval System (VII)
作者:	傅心家 Fu Hsin-Chia 國立交通大學資訊工程學系（所）
關鍵字:	類神經網路;影像檢索;影像分群;分群演算法;分散式系統;Neural Network;distributed image retrieval;image clustering;clustering algorithm;Grid computing
公開日期:	2010
摘要:	由於數位影像技術的提升與大容量儲存媒介價格的下降，大量的影像及視訊資料的儲存愈來愈普遍。隨著網際網路的普及通訊品質的提升，有愈來愈多的數位媒體資料，如文字、影像、視訊等被存放於網際網路上。如何精確、有效且快速地從網路獲得想要的資訊，已是網路搜尋服務亟欲解決的問題。本長期研究計畫首先針對Corel 影像資料庫，於94 年度研發建立了以Visual keyword 作為影像索引的影像搜尋引擎。初步實驗結果顯示，利用Visual keyword 搜尋Corel 影像資料庫的命中率超過九成。隨後，95~97 年度研究計畫將此技術拓展於網際網路上的影像搜尋上，建立了由20 台影像伺服器所組成的分散式影像檢索系統，搜尋的範圍擴展到200,000 張以上的Internet 或WWW 影像。97 年度初步的研究的成果顯示檢索命中率已達六成以上，每次查詢平均需時1～5 秒。本計畫書擬規劃一個三年期的研究工作，希望能在已有的研究成果上只添加20~30 部PC，就能在相同時間（1～5 秒）內搜尋數（2~5）百萬張影像，提供使用者想要的圖片（首頁20～30 張圖像達95％的正確率）。本三年期研究擬進行的工作項目：（1）第一年完成一個新SOM 分群演算法，將用目前已收集到的二十萬張影像，來驗証分群演算法的功效；（2）第二年將擴大收集WWW 影像到壹百萬張，實做分群演算法在一個含30 個PC 的分散式系統，來實測分群演算法的功效（包括分群需要的計算時間及所達到的正確率）；(3）第三年將研發一個兩階層的分散式計算架構：（1）影像分群及（2）影像檢索，來建構一個包含50 部PC 就能搜尋數百萬張影像的電腦系統。我們預期目標是在 1~5 秒內搜尋2～3 百萬張影像，達到整體有70%或首頁（前20～30 張）95%的正確率。 The growth of the Internet and services has caused a corresponding explosion in the amount of media data that needs to be archived. The most important issue of the web-based search service is how to retrieve the desired data from Web efficiently and correctly. In the early projects, we proposed a novel image index called Visual keyword and built an image search engine to retrieve the Corel gallery images. The experiment results show that the hit rate of the retrieved images is about 90%. During the following 3 years, we built a web-based image search engine, which is a distributed system with 20 PC servers, to retrieve about 200,000 images on Internet. The hit rate of the retrieved images is about 60%, and the query to retrieval time is between 1 to 5 seconds. In this proposal, we plan to build a distributed image retrieval system, which contains 50 PC servers to be able to search multimillion WWW images in 1-5 seconds with an average accuracy of 70% and the first page of 95%. We plan to achieve this goal in three years. In the first year, we will develop a new probabilistic SOM based clustering algorithm, and use the collected 200,000 WWW images to test the performance of this new algorithm. Then the second year’s work will be to implement and to test the new clustering algorithm on a distributed system with 30 PC servers. And also, one million WWW images will be collected as the testing data. Finally, we will build a two layer computing architecture to host the image clustering and retrieving functions. Our goal is to use the clustering mechanism to reduce the computing time for searching or matching the similar images, so that the 50 PC based distributed computing system can achieve multi-million image search and retrieval functionality.
官方說明文件#:	NSC99-2221-E009-138
URI:	http://hdl.handle.net/11536/100750 https://www.grb.gov.tw/search/planDetail?id=2112006&docId=337417
Appears in Collections:	Research Plans