標題: XML文件漸進式結構化索引更新與路徑表示式檢索之研究
Incremental Structured Index Update Mechanism and Path Expression Query for XML Documents
作者: 洪健哲
Chien-Che Hung
柯皓仁
楊維邦
Hao-Ren Ke
Wei-Pang Yang
資訊科學與工程研究所
關鍵字: 漸進式結構化索引更新;k-ary樹狀結構;路徑表示式檢索;分裂方法;可擴展標示語言;Incremental Structured Index Update;k-ary Tree;Path Expression Query;Split Method;XML
公開日期: 2001
摘要: XML文件中隱含了豐富的結構化資訊,為了能利用這些結構化資訊提供結構化檢索(Structured Query),本論文利用k-ary 樹狀結構的特性來建立結構化資訊索引(Structured Index)。基於此索引結構,本論文提出一套漸進式結構化索引更新機制來避免文件結構更新時需重新建置結構化資訊索引,此機制稱為Split Method。此外,本論文實作了一套SQL-like的路徑表示式檢索(Path Expression Query),可讓使用者在結構資訊不規則或不明確的情況下透過結構化資訊索引找出滿足檢索路徑條件的資料,並將檢索結果以XML呈現。 在結構化文件索引更新中,若加入資料的節點分支數超出索引結構之最大分支數k時,傳統的方法係將結構索引重建,當結構資料時常變更時,此索引結構重建的步驟相當耗時。有鑑於此,本論文乃提出一個漸進式結構化索引更新的機制來避免索引重新建置。此方法主要利用資料分屬不同文件的概念,即分裂的節點所屬之文件編號(Document Identifier, DID)與原始資料之DID的不同。當加入資料的節點超出索引結構之最大分支數時,從該節點分裂出一個節點,而分裂的節點另屬於不同的DID,並定義一個對等關係來記錄該節點與分裂節點間的關連性。 從效益評估結果顯示,雖然本論文提出之Split Method需要較多的索引空間花費,以本論文中的實驗為例,索引空間花費的Overhead約介於2.38%與17.03%之間,然而在索引更新的效能上大大優於索引重建的方法,且對於實際檢索的效能,我們的方法與原來索引重建的方法相較之下並未相差太大。
XML documents contain abundant structural information that can be employed to better understand the XML documents. Structured queries are queries that search the structural information of XML documents. In order to support structured queries, k-ary trees are used in this thesis as an index structure to store structural information. Based on this index structure, this thesis proposes an incremental structured index update mechanism and implements a few functions of a SQL-like path expression query language, called Lorel. K-ary trees are a well-known index structure can be used to store structural information of XML documents, where k is a predefined maximum number of branches that a tree node can have. When an XML document is modified, its corresponding k-ary tree has to be updated as well for reflecting the modification. However, when a modification makes the number of branches of a tree node exceeds k, the k-ary tree is no longer workable and the whole k-ary tree has to be reconstructed. Reconstructing the whole k-ary tree is time-consuming when the structure information changes frequently. This thesis proposes an incremental structured index update mechanism called the Split Method to avoid reconstructing the k-ary tree when the number of branches of a node exceeds k. Through splitting nodes and defining an equivalent relationship, this mechanism can update k-ary trees efficiently. Several evaluations were conducted to assess the Split Method and the k-ary tree reconstruction. The evaluation shows that the Split Method only wastes a little more space than tree reconstruction, but it is more efficient than tree reconstruction in updating k-ary trees. In addition, based on the k-ary tree structured index, this thesis implements a few functions of a SQL-like path expression query language, called Lorel. Path expression queries facilitate user to issue structural queries when the document structure is irregular or unknown in advance. The results of queries are formatted as XML documents.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT900394086
http://hdl.handle.net/11536/68614
Appears in Collections:Thesis


Files in This Item:

  1. 039408601.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.