标题: 类神经网路影像检索之研究---数百万页影像检索系统之建构(VII)
The Study of Image Retrieval by Neural Networks---The Construction of a Multi-Million Pages Image Query and Retrieval System (VII)
作者: 傅心家
Fu Hsin-Chia
国立交通大学资讯工程学系(所)
关键字: 类神经网路;影像检索;影像分群;分群演算法;分散式系统;Neural Network;distributed image retrieval;image clustering;clustering algorithm;Grid computing
公开日期: 2010
摘要: 由于数位影像技术的提升与大容量储存媒介价格的下降,大量的影像及视讯资料的储存
愈来愈普遍。随着网际网路的普及通讯品质的提升,有愈来愈多的数位媒体资料,如文
字、影像、视讯等被存放于网际网路上。如何精确、有效且快速地从网路获得想要的资
讯,已是网路搜寻服务亟欲解决的问题。本长期研究计画首先针对Corel 影像资料库,
于94 年度研发建立了以Visual keyword 作为影像索引的影像搜寻引擎。初步实验结果
显示,利用Visual keyword 搜寻Corel 影像资料库的命中率超过九成。随后,95~97
年度研究计画将此技术拓展于网际网路上的影像搜寻上,建立了由20 台影像伺服器所
组成的分散式影像检索系统,搜寻的范围扩展到200,000 张以上的Internet 或WWW 影
像。97 年度初步的研究的成果显示检索命中率已达六成以上,每次查询平均需时1~5
秒。本计画书拟规划一个三年期的研究工作,希望能在已有的研究成果上只添加20~30
部PC,就能在相同时间(1~5 秒)内搜寻数 (2~5) 百万张影像,提供使用者想要的
图片(首页20~30 张图像达95%的正确率)。本三年期研究拟进行的工作项目:(1)第
一年完成一个新SOM 分群演算法,将用目前已收集到的二十万张影像,来验证分群演算
法的功效;(2)第二年将扩大收集WWW 影像到壹百万张,实做分群演算法在一个含30
个PC 的分散式系统,来实测分群演算法的功效(包括分群需要的计算时间及所达到的
正确率);(3)第三年将研发一个两阶层的分散式计算架构:(1)影像分群及(2)影像
检索,来建构一个包含50 部PC 就能搜寻数百万张影像的电脑系统。我们预期目标是在
1~5 秒内搜寻2~3 百万张影像,达到整体有70%或首页(前20~30 张)95%的正确率。
The growth of the Internet and services has caused a corresponding explosion in the
amount of media data that needs to be archived. The most important issue of the
web-based search service is how to retrieve the desired data from Web efficiently and
correctly. In the early projects, we proposed a novel image index called Visual keyword
and built an image search engine to retrieve the Corel gallery images. The experiment
results show that the hit rate of the retrieved images is about 90%. During the following 3
years, we built a web-based image search engine, which is a distributed system with 20
PC servers, to retrieve about 200,000 images on Internet. The hit rate of the retrieved
images is about 60%, and the query to retrieval time is between 1 to 5 seconds. In this
proposal, we plan to build a distributed image retrieval system, which contains 50 PC
servers to be able to search multimillion WWW images in 1-5 seconds with an average
accuracy of 70% and the first page of 95%. We plan to achieve this goal in three years. In
the first year, we will develop a new probabilistic SOM based clustering algorithm, and
use the collected 200,000 WWW images to test the performance of this new algorithm.
Then the second year’s work will be to implement and to test the new clustering algorithm
on a distributed system with 30 PC servers. And also, one million WWW images will be
collected as the testing data. Finally, we will build a two layer computing architecture to
host the image clustering and retrieving functions. Our goal is to use the clustering
mechanism to reduce the computing time for searching or matching the similar images, so
that the 50 PC based distributed computing system can achieve multi-million image
search and retrieval functionality.
官方说明文件#: NSC99-2221-E009-138
URI: http://hdl.handle.net/11536/100750
https://www.grb.gov.tw/search/planDetail?id=2112006&docId=337417
显示于类别:Research Plans