標題: 使用網頁擷取工具的自動化XML 文件轉換
Automatic XML Transformation by Using Web Data Extraction Tools
作者: 林宜融
關鍵字: XML;擷取;XML 文件轉換;XML;Extraction;XML Transformation
公開日期: 2006
摘要: 隨著Web 2.0時代的來臨,與其相關的應用和技術的發展也越來越興盛。大量的資訊透過網際網路這個途徑快速的傳遞。因此如何整合多樣化且隨時更新的電子文件就成為一個很重要的問題。XML文件能夠讓使用者自行定義相關的標籤與屬性的性質,把資料用XML的形式儲存可以表達出文件內容的結構。XSLT與XQuery為兩種常見的Script來描述如何將多個XML文件的查詢與轉換成HTML文件來加以呈現。以往無論是多個資料來源的整合,或是程式設計師和網頁的美工人員為了設計這些XSLT或XQuery的Script所需的溝通協調,都是相當費時的工作。因此本論文提出一個自動化的XQuery Script生成系統。此系統分析圖形化介面的網頁擷取工具在萃取網頁資料時的擷取規則,進而產生相關的XQuery Script來使得萃取出來的資料能重新以網頁的方式呈現。
As the Internet becomes more popular, a large number of documents are published via Web sites. XML is a standardized format designed for representing structured data to Web applications. Thus, transformation of documents in XML format becomes an important issue to integrate multiple and frequently updated Web sites. Currently, XSLT and XQuery are two common ways to describe how to transform the XML documents to HTML format for representation. However, developing XSLT or XQuery is a time-consuming task. In this thesis, a system to generate XQuery script automatically is developed. The proposed system utilizes the existing visual Web data extraction tools to gather the mapping relationship between the source and destination documents to be transformed. By recording and analyzing the mapping relationship, the transformation script such as XQuery can be generated automatically.