標題: 使用Ontology在特定領域為資訊作自動化的標記
Automatic Annotation in Specific Domain Using Ontology
作者: 陳彥仲
Yan-Zhong Chen
孫春在
Chuen-Tsai Sun
資訊科學與工程研究所
關鍵字: 實體論;資訊擷取;基因演算法則;結構化網頁文件;Ontology;Information Extraction;Genetic Algorithm;Structural web pages
公開日期: 2002
摘要: 由於網路的使用成長率,在近幾年都是一直的增加,而隨著網頁數量不斷地增加,資訊也跟著不斷地累積。從一個角度來說,這表示人們可以取得的資訊越來越多;但在另一方面,這表示人們需要使用更好的方式來尋找需要的資訊。傳統尋找資訊的方式主要是以字串比對的方式來完成,而這在某些情況下已經很足夠了,不過有時找到的資訊並不會是使用者真正所希望的。會發生這樣的情形,主要是因為網頁中的資訊不具有語義。 因為有了上述的情況,我們可以在網頁中加入語義的資訊標記,以描述在網頁中的資訊,讓它們不再只是單純的文字而已。在論文中所提出的系統,主要會結合ontology的使用,並且針對具有結構化之資訊的網頁作資訊擷取的動作;對於用以擷取的規則,會使用基因演算法則來作生成的動作。Ontology與基因演算法則的結合不但可以使形成的擷取規則有優化的表現,另外對於處理具有同質資訊的不同網站,也不需要特別地對系統作大幅度的調整。在系統產生適當的擷取規則以後,就可以利用它來對該種網頁作資訊的擷取,並且依照選擇的資訊描述標記語言,製作出資訊標記。使用論文裡所提出的機制,我們可以向Semantic Web的境界更邁進一步。
In recent years, due to the continuous growth of the Internet, the number of web pages on WWW increases tremendously, and the information in these pages is also accumulating everyday. From one point of view, it means that the information we can obtain becomes more and more convenient, but from the other point of view, it means we need a better approach to searching critical information. Most traditional approaches for searching information employ pattern- matching, and in many situations, it works well. However, more and more frequently may find us information that is not what we really want. This is mainly because the information in web pages usually contains no semantic. In order to solve the problem we discuss above, we can add some information annotations that provide semantic- meaning to describe the information in the web pages, and make the pages not just simply a collection of words and graphics. The system we design in this thesis employ the concept of ontology, and it can effectively help the information extraction process for structural web pages. In addition, as we implement Genetic Algorithm in our system, we can generate the extraction rule automatically. The integration of ontology and Genetic Algorithm can not only optimize the generated extraction but also make our system able to deal with different web sites with little changes. After generating the extraction rule, we can use it to perform information extraction for that kind of pages, and then, to generate the information annotation according to the annotation language we choose. The mechanism addressed in our thesis can help us move forward to a meaningful and useful Semantic Web.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT910394033
http://hdl.handle.net/11536/70205
顯示於類別:畢業論文