Title: 利用網頁挖掘技術擴充企業資料倉儲的外部資訊
Enriching Enterprise Data Warehouse by Web Mining Technology
Authors: 余俊緯
Jun-Wei Yu
Chyan Yang
Keywords: 網頁挖掘;文件倉儲;資料倉儲;Web Mining;Document Warehouse;Data Warehouse
Issue Date: 2001
Abstract: 資料倉儲系統和線上分析處理(OLAP)是用來支援決策的兩項重要技術,其目的在於讓知識工作者(如:管理者、分析師)能在短時間內做出有效的決策,有愈來愈多的企業都已經紛紛引入這項資訊技術來提升企業的決策品質和競爭力。儘管如此,資料倉儲仍有些天生的限制,那就是資料倉儲只能儲存數字型態的資料,而且這些資料大都是源自企業內部的作業性資料(Operational Data)。換句話說,資料倉儲只涵蓋結構化之資料,對於其他半結構或非結構化的資料則非常匱乏。雖然我們可以利用線上分析處理技術來將資料倉儲中的資料做一些複雜的分析處理以做為決策分析的依據,但光靠資料倉儲中的數字資料通常是不夠的,知識工作者常常會需要一些以其他型態存在的相關外部資訊來做為決策的參考。針對此一問題,本研究嘗試著運用網頁挖掘(Web Mining)之技術去挖掘出蘊含在網際網路中的一些重要網頁資訊,並將這些挖掘回來的資訊放置入文件倉儲(Document Warehouse)中儲存,目的在於希望藉由結合文件倉儲中的文字資訊與資料倉儲中的數字資料能夠提供知識工作者一些合適的外部資訊,以彌補資料倉儲只能提供數字資料的缺失,並進一步提升企業的商業智慧。此外,本研究亦提出一個WMIS的雛形系統,它是一個結合了網頁文字挖掘(Web Text Mining)和多維度文件分析(Multi-dimension document analysis)的代理系統,透過它能夠幫助使用者有效地從HTML文件中挖掘出有利用價值的資訊。
Data warehousing and OLAP are two of the most significant technologies for decision support, aimed at enabling the knowledge workers such as executives, managers and analysts to make better and faster decisions. There are more and more companies considering adding data warehouse technology to enhance decision quality and business core competence. In spite of this, the data warehouse still has some innate limitations: it contains only numeric data and most of them are derived from the operational data inside the enterprise. Although we can utilize the OLAP technology to perform complex analysis over the information stored in the data warehouse, it is not enough for knowledge workers to analyze or make decisions by only the numeric data stored in the data warehouse. They usually need some external information for decision support. In this study, we utilized the web mining technology to mine some relevant and valuable web contents from the Internet and put these contents into the document warehouse. By combining the textual information inside the document warehouse and the numeric data from the data warehouse, we can provide competitive advantages over those who work with just the numbers. In addition, WMIS, a prototype of web information mining system, was proposed in this study. WMIS is an agent system which combines web text mining and multi-dimension document analysis to help users in mining HTML documents on the web effectively.
Appears in Collections:Thesis