標題: 在布隆過濾器下改善範圍搜尋方法
The Bloom Filter Design for Numerical Range Query
作者: 范坤揚
Kun-Yang Fan
張明峰
Ming-Feng Chang
網路工程研究所
關鍵字: 布隆過濾器;範圍搜尋;Bloom filter;Range Query
公開日期: 2007
摘要: 布隆過濾器是一個簡單具有空間效率並且隨機的資料結構,被用作簡明的資料集合的表示方法。它的隨機特性具有相當大潛力被應用在分散式網路系統,並且支援系統內的會員資格詢問加上較低的錯誤判正率。布隆過濾器的錯誤判正率是一種事件發生機率,代表一個不屬於資料集合的元素被布隆過濾器判定為屬於資料集合內。目前已經有很多的研究在於如何降低布隆過濾器的錯誤判正率,但是相當少的文獻探討在針對數值範圍搜尋應用下改善布隆過濾器。然而,很少的文獻在探討針對數值範圍詢問的布隆過濾器設計。由於布隆過濾器只能表示有限制的原素個數,當一個大的數值範圍插入布隆過濾器時,錯誤判正率會急劇的升高。在本篇論文中,我們針對數值範圍提出有效率的布隆過濾器的設計。首先,區間方法利用將數值範圍分群到區間的方式來降低插入的元素個數,因此數值在同一區間會被視為同一元素。另一方面,重疊方法利用連續數值的插入位元的重複來降低布隆過濾器的插入位元數。另外,區間和重複的方法結合之前所提到的方法。分析模型被用來找出各個方法的錯誤判正率。電腦模擬被用來驗證我們分析模型的正確性。更進一步,布隆過濾器表示連續範圍之最佳的參數設定可以被得到,因此錯誤判正率會被減到最小。一個啟發式演算法已經被提出用來找在多重屬性下的近似最佳參數設定。區間和重疊的方法針對數值範圍延伸布隆過濾器的設計當傳統的布隆過濾器不能在被使用。
A Bloom filter is a simple space-efficient randomized data structure for concisely representing a data set. The property of its randomization has great potential for distributed network systems, and it supports the membership query with a small false positive rate, which is the probability that an element was not in the data set but Bloom filter reported it is. There have been many studies on how to improve the correctness of Bloom filter by reducing the false positive rate. However, little research has been done on Bloom filter design for numerical range query. Since a Bloom filter can only represent a limited number of elements, when a large range of numerical attributes are inserted into a Bloom filter, the false positive rate increases dramatically. In this thesis we present efficient Bloom filter design for numerical ranges. First, Division scheme reduces the number of elements inserted by grouping the numerical range into divisions, i.e., numbers in the same division are treated as the same element. On the other hand, Overlapping scheme reduced the number of bits inserted in the Bloom filter by overlapping the inserted bits of consecutive numbers. In addition, Division and Overlapping scheme combines the techniques of the aforementioned two schemes. Analytic model was used to derive the false positive rates of the schemes. Computer simulations were used to verify the correctness of the analytic model. Moreover, the optimal configuration of Bloom filter representing a numeric range of single attribute can be obtained, i.e., the false positive rate is minimized. A heuristic algorithm has been developed to obtain near optimal configurations for multiple attributes. The Division and Overlapping scheme extends the Bloom filter design for numerical range query, where traditional Bloom filter cannot be used.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009556520
http://hdl.handle.net/11536/39616
顯示於類別:畢業論文


文件中的檔案:

  1. 652001.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。