標題: 以網格為基礎的鄰近密集區域查詢之研究
Grid-Based Nearest Dense Window Query Processing
作者: 蘇庭昱
Su,Ting-Yu
黃俊龍
資訊科學與工程研究所
關鍵字: 網格式;鄰近密集區域;四元樹;空間資料查詢;地理位置服務;grid;nearest dense window;quadtree;spatial query processing;location-based service
公開日期: 2016
摘要: 近幾年因為各種地理位置服務 (Location Based Service LBS) 急速增加,各種 spatial query 和相關應用相繼提出,其原因在於智慧型攜帶裝置和行動網路所提供的便利性。隨著攜帶裝置的普及和使用者數量的增加,每天都能產生巨量的資料,為了能夠加速處理這些巨量資料,我們希望能透過分散式運算來加快處理的時間,如 Hadoop 裡的 Map/Reduce 。在本篇論文中,我們著重在 Nearest Dense Window Query (NDW) 的研究, NDW 是針對資料密度的查詢,使用者可以輸入一個查詢點、希望得到的空間大小和空間內資料數量的門檻, NDW 會回傳最靠近查詢點且符合給定的空間大小,並且空間內的資料數量大於給定的資料數量門檻,因為查詢結果須符合給定的資料密度,所以稱做 dense window 。在原有的 NDW 演算法中,主要使用了 R-tree 來建立 spatial index ,但是 R-tree 若是要應用在 Map/Reduce 中,會因為 node 間有 overlapping 的問題,使得實作較為困難,且效能較差,所以我們在本篇論文中將 spatial index 改為 grid index 來做資料儲存,一方面 grid index 在 Map/Reduce 中的效能較 R-tree 好,另外在原本 NDW 演算法中有部分方法有用到 grid index 來幫助 optimize ,所以改用 grid index 也能同時處理 optimize ,節省部分儲存空間。另外,我們還加入了 cache 來節省資料存取的時間,以改進 grid index 在處理速度較慢的問題。
Recently, various location based services (LBS) are growing vigorously, thanks to the convenience of mobile device and wireless networks systems. With the ubiquity of mobile devices and growing of users, a huge amount of data will be recorded. In order to speed up the process on huge data volumn, we want to use distributed systems to deal with the data like Map/Reduce in Hadoop. In this paper, we focus on the study of Nearest Dense Window Query (NDW). NDW is a query that specifies the density of data. Users give a query point, the size of specified space, and the specified density of data in this space. According to the density, NDW will return a space that is nearest to the space and the space size is the same as user's demand. Since the result must agree with user's demand, this space we call it dense window. In the original NDW algorithm, it uses R-tree to build the spatial index. But R-tree has the problem about overlapping between nodes when it is applied on Map/Reduce. This will make the programming procedure more difficult, and the performance will be worse. In this paper, we replace the spatial index of R-tree with grid index for data storage because of the better performance of grid index on Map/Reduce. Besides, some original NDW algorithms already use grid index to help optimizing. We thus adopt grid index to handle this optimization at the same time, to save of storage space. In addition, we add cache to save the time on data access and meanwhile mitigate improve the problem about lower processing speed on grid index.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070256036
http://hdl.handle.net/11536/127188
顯示於類別:畢業論文