標題: HTML文件至WML文件之自動轉換系統
Automatic Transformation from HTML Documents to WML Documents
作者: 徐元瑛
Yuan-Ying Hsu
曾建超
Chien-Chao Tseng
資訊科學與工程研究所
關鍵字: WAP;HTML;WML
公開日期: 1999
摘要: 隨著無線通訊技術與行動手機功能的進步,利用行動手機隨時、隨地「無所不在」地上網擷取資訊已不再是夢想。但由於行動手機的使用有一些先天上的限制,所以為了無線網路服務應用的發展,WAP Forum針對了行動手機的限制以及無線通訊的特性制訂了Wireless Application Protocol (WAP),透過一個WAP Gateway可以讓行動手機連上網際網路,讀取網路上的資訊。 其中WAP的資訊網(Web)文件須為WML(Wireless Markup Language)格式,而現有網際網路的文件則是以HTML寫成,因此WAP手機無法直接讀取現有的網路文件。導致網路資訊提供者(ICP)還需要多學習一種語言並重新設計另外一套WML網頁。 因此本篇論文提出了一個方法來解決這樣的問題,設計出一套文件自動轉換系統,讓WAP Gateway能夠自動的將HTML文件轉成WML文件。不過由於手機本身的限制,並不是有能力瀏覽目前網路上多采多姿的資訊,而是以文字以及簡單的圖形為主,所以這套文件自動轉換系統會針對單純的HTML文件來作轉換,而會忽略其他以美化網頁的Script以及一些程式控制的部分。我們運用了Compiler的技術先將HTML文件分析為一樹狀架構,再藉由traverse該tree將HTML文件轉換為WML文件;其中我們根據實際的統計資料歸納HTML文件的結構性,並根據文件的結構屬性找出HTML文件最適宜的轉換方式,以保留原來HTML的結構階層性以及完整性,最後再經過一個編碼器轉換中文的編碼方式即可。 透過這套自動轉換系統可以針對HTML以及WML的特性與限制將HTML文件做相當適當的轉換,所以產生出來的WML文件都不會超過WML所定義的檔案最大限制,其中有92%的文件可以轉換出可被接受的有結構性文件,且有86%的WML文件可以完全正確的顯示出原來HTML文件之結構性。而當原有HTML文件定義的越清楚時,就會得到越佳的轉換結果。
With the advance of technologies of wireless Internet and mobile terminals, it is now not a dream to access information from Internet with a mobile terminal. Unfortunately, there are still some limitations of the mobile terminals and communication environment. To overcome these limitations, Wireless Application Protocol (WAP) was proposed by the WAP forum to efficiently support applications running over wireless networks. In WAP, the standard description language of web documents is Wireless Markup Language (WML). WML is different from HTML, the document description language used in Internet, and thus a HTML document can not be displayed by a WAP terminal directly. Therefore, the internet content providers (ICPs) should create WML documents if they want to provide the information to the users using WAP mobile terminals. This is definitely an overhead to the ICPs. In this thesis, we propose an automatic transformation filter which can be implemented on the WAP gateway to transform HTML documents to WML documents. When a HTML document is delivered through the WAP gateway from original web servers, the document can be translated into a set of WML documents by the filter and delivered to the client. First of all, we use compiler technologies to parse the HTML document and build a parsing tree for the document. Then, we can generate a set of WML documents by traversing the parsing tree. During the process of traversing, we presume the structure of the HTML document according to the statistic result and find the most proper transformation method on the basis of the structure and limitations of WML. Finally, we change the encoding strategy of Chinese from Big-5 to Unicode by an encoder. After the transformation, all the WML documents will not exceed the limitation of the file size. Experimental results show that 92 percent of the documents can be transformed into an acceptable structure, and 86 percent of the WML documents can preserve the correct structure of the original HTML documents. The results of transformation scheme can be further improved if the HTML documents are defined more clearly.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT880392091
http://hdl.handle.net/11536/65492
Appears in Collections:Thesis