標題: 由分群而得的近似最近鄰居之搜尋法
Approximated Nearest Neighbor Searching Based on Clustering
作者: 莊豐誌
Zhuang, Feng-Zhi
林志青
Lin, Ja-Chen
多媒體工程研究所
關鍵字: 最近鄰居;近似最近鄰居;最近鄰居搜尋;分群;Nearest neighbor;Approximate Nearest neighbor;Nearest neighbor search;clustering
公開日期: 2015
摘要: 最近鄰居之搜尋(Nearest Neighbor Searching)是一個著名的監督式分類演算法,經常被使用在影像處理,數據挖掘,機器學習或者是資訊檢索等研究上。在低維度時,KD-tree有不錯的建造時間(O(logN))以及良好的query時間(O(NlogN)) 。而在高維度的資料時,為了query的效率,常常採用近似的方法來解決這種問題。本論文以文獻上所刊載的Inverted Multi-Index (IMI)為基礎,提出兩個在“Euclidean距離空間”下使用之近似最近鄰居之搜尋法,增進其效能。在第一個方法,我們改善了query的召回率,並且減少了整個模型的建造時間;在第二個方法,我們參考了隨機 KD-tree的方式來建構整個模型,實驗結果顯示,此方法不僅在召回率上有不錯的增幅,同時在query時所需付出的代價也非常的低。
Nearest Neighbor Searching is a well-known algorithm for classification, it can be applied to the research field such as image processing, data mining, machine learning or information retrieval. For low-dimensional data (usually less than 20 dimensions), KD-tree is a good solution, which is with short construction time (O(logN)) and query time (O(NlogN)). However, for high-dimensional data, due to efficiency of query, approximated nearest neighbor searching is more practical. In this thesis, based on the Inverted Multi-Index (IMI) method reported in literature, we propose two approximated nearest neighbor searching methods for Euclidean distance metric. In method 1, we improve the recall rate and reduce the construction time of the searching model. In method 2, we propose the randomized IMI to construct the searching model. Experimental results show that, for acceptable recall-rate, method 2 yields a recall rate slightly better than IMI does, without paying overhead on the query time.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070156643
http://hdl.handle.net/11536/126367
顯示於類別:畢業論文