標題: 案例式知識表示法及轉換法之研究
A Study of Knowledge Representation and Transformation for Case-based Expert Systems
作者: 江孟峰
Mon-Fong Jiang
曾憲雄
Shian-Shyong Tseng
資訊科學與工程研究所
關鍵字: 案例式推論;資料探勘;知識表示法;分群;專家系統;Case-based Reasoning;Data Mining;Knowledge Representation;Clustering;Expert Systems
公開日期: 1999
摘要: 近年來,專家系統因為其適合多種應用的特性而被廣泛的運用於產業界;其中案例式專家系統將知識以案例的方式儲存於知識庫之中,但如何建立一個適合有效檢索適當案例的案例庫依舊是此領域的瓶頸。在此篇論文中,我們首先提出了新的知識表示法SC,可以表現案例之間的關聯性,亦即當案例庫建構完成時,所有案例的階層結構也同時建構完成;對於任一個案例而言,利用此表示法可以容易地找出最相似的案例,此特性可以幫助使用者有效的檢索適合的案例,為了詳細闡述此表示法的特性,我們提出了後繼者(successor)、立即後繼者(immediate successor)、覆蓋(cover)等定義,以及相關搜尋這些定義的演算法,並分析其正確性及執行複雜度,同時也以有限狀態機的方式表現相關演算法的執行效能。 針對此知識表示法,我們也發展出案例式專家系統的發展程序,包括案例庫的建置,案例的檢索,以及案例的調整;在案例庫的建置中,我們首先提出適當的資料轉換法,並著重於資料型態的轉換部分,我們提出的二階段式轉換法,是將較繁雜的工作在第一階段完成,讓使用者可以在第二階段專注於資料探勘法的選擇;當所有原始資料轉換後,後續亦發展了二階段式的分群程序,這是因為當案例數量太多時,可以在第一階段讓類似的案例聚集,以降低案例的數量並減少第二階段建構案例庫的成本,同時在第一階段亦可檢測出異常的案例,以減少對整個案例庫建構時的影響。 為了驗證整個發展程序的效能,我們在資料型態的轉換上以電子郵件的應用為例,經由第一階段資料型態的轉換,使用者可以只依據問題的特性選擇資料探勘法,而不需考慮轉換的問題;在分群程序的驗證上,我們分別以鳶尾花資料(IRIS)、甘蔗育種資料(Sugar-cane breeding data)、電子郵件記錄檔(E-mail log)等資料實驗,結果顯示二階段的分群程序比傳統方法更能有效地找出異常群集;最後,我們建立了一個人事法規的諮詢系統,經由與傳統方式比較,使用者確實可以較容易的查詢到適當的案例。
Nowadays, expert systems are useful in business and industrial environments due to a variety of applications. A case-based expert system (CBES) representing knowledge base by cases is a kind of expert systems. However, how to construct the case base such that user can effectively retrieve approximate case is still a bottleneck in building CBES. In this work, we first propose a new knowledge representation, named Structural Cases (SC), to describe the relationships among all the cases. As soon as the case base is constructed, the hierarchy of all cases is also constructed. Rather than the traditional flat structure in case base, the cases’ hierarchy may enhance the efficiency of case retrieval. Based upon the cases’ hierarchy, some algorithms are proposed to find the most similar case for any arbitrary case. According to the proposed algorithms, the finite automaton is used to illustrate the efficiency of the algorithms. Moreover, the algorithms for interaction process based on the SC are proposed to retrieve more suitable results. Based upon the proposed representation and algorithms, a CBES developing process including case base construction, case retrieval, and case adaptation is also proposed. In the case base construction, a two-phase data type transformation framework including merging and transforming phases is first proposed. The preprocessing work of data types transformation is often tedious or complex since a lot of data types exist in real world. With the two-phase data type transformation framework, since the preprocessing work is finished in the first phase, users only need to determine which kinds of mining algorithms will be used in the following phase. After the raw data have been transformed into the suitable data types, a two-phase clustering-based approach has been developed to construct the structural cases in this work. In the clustering step, we want to find out the outlier cases before constructing case base in order to reduce the influence of the outlier cases. Our idea is first to partition the data points into several clusters each of which may be all outliers or all non-outliers. After partitioning the data points, it can be easily seen that the time complexity for finding the outliers clusters may be reduced. To verify the practicability and performance of our CBES developing process, some experiments have been done. First, in the data types transformation process, an e-mail management system has been implemented to help users to manage the e-mails by finding out the rules about their interests. The experimental results show the data types transformation process is practicable. Second, three different experimental data including two-dimensional data from Iris flower data, four-dimensional sugar-cane breeding data set, and E-mail log data, are used to compare our two-phase clustering process with traditional clustering algorithms. All the experimental results show that our method generally works better than traditional clustering algorithms. Finally, an application system for Taiwan personnel regulations has been easily developed based upon the proposed representation and algorithms. Comparing the accuracy for retrieval results with and without our system, we found that the retrieval results using our system are better than traditional approaches and the query process of users are simplified.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT880394034
http://hdl.handle.net/11536/65529
顯示於類別:畢業論文