標題: 以知識庫為基礎的模糊資訊擷取新方法
New Knowledge-Based Fuzzy Information Retrieval Methods
作者: 洪一禎
Yih-Jen Horng
陳錫明
李嘉晃
Shyi-Ming Chen
Chia-Hoang Lee
資訊科學與工程研究所
關鍵字: 資訊擷取系統;知識庫;模糊觀念網路;繼承階層;模糊階層分群;模糊推論技術
公開日期: 2002
摘要: 根據在知識庫中所含的知識,知識庫可幫助資訊擷取系統以更具彈性及更具智慧的方式來擷取出符合使用者需求的文件。因為模糊觀念網路包含了節點及具有方向性的連結,使其能很容易的表示在資訊擷取環境中有意義的知識,所以有許多資訊擷取系統採用模糊觀念網路以作為其知識庫。然而傳統的模糊觀念網路其節點與節點之間的關聯值只能以介於0與1之間的實數值表示,而且節點與節點之間只能以一種模糊關係相連結。因此基於傳統的模糊觀念網路的模糊資訊檢索方法在實際應用上不夠彈性。 在本論文中,我們首先提出模糊值觀念網路,其中節點與節點之間的關聯值可以用三角形或梯形模糊數表示。我們也提出了一個在此種模糊值觀念網路中找尋繼承階層的方法。然後我們進一步擴展模糊值觀念網路的定義,使得節點與節點之間的關聯值可以用任意形模糊數表示,並提出了一個基於此種模糊值觀念網路的資訊擷取方法。在本論文中,我們並提出多關係觀念網路,使得觀念與觀念之間同時可以有多種關係相互連結,此種多關係觀念網路對於資訊擷取系統於文件擷取上非常有用。另外,為了減少建立多關係觀念網路所需的勞力,我們提出了一個根據訓練文件來自動建立多關係觀念網路的方法。 在本論文中,我們也提出了一個基於模糊階層分群及模糊推論技術以做模糊資訊擷取的新方法。在此方法中,首先我們提出了一個模糊聚合式階層分群演算法來做文件分群,並取得群中心。然後我們提出了一個依據文件群中心來建立模糊邏輯規則的方法。最後我們再運用所建立的模糊邏輯規則來擴展使用者的查詢,以找出更多符合使用者需求的文件。 在本論文中,我們亦提出一個基於文件索引詞變更權重技術以做模糊資訊擷取的新方法。在此方法中各文件描述向量中的索引詞權重將依使用者相關回饋加以修改。在修改各個文件描述向量中的索引詞權重後,將使得符合使用者需求的文件其符合度提高,且不符合使用者需求的文件其符合度將降低。最後各個文件描述向量的調整向量將被當成使用者個人側寫儲存下來,以作為將來查詢處理時使用。
By means of the embedded knowledge in knowledge bases, knowledge bases can help information retrieval systems to retrieve relevant documents with respect to the user’s query in a more flexible and more intelligent manner. Since fuzzy concept networks consisting of nodes and directed links are easy to represent the relationships between meaningful entities in the information retrieval environment, many fuzzy information retrieval methods have been proposed to utilize fuzzy concept networks as knowledge bases. However, the relevant values between concept nodes in the traditional fuzzy concept networks are restricted to real values between zero and one. Moreover, the concepts in a traditional fuzzy concept network can be related to each other by only one kind of fuzzy relationship. The fuzzy information retrieval methods based on these traditional fuzzy concept networks are not flexible enough in practical applications. In this dissertation, we firstly propose the concept of fuzzy-valued concept networks to allow the relevant values between concepts to be represented by triangular or trapezoidal fuzzy numbers. We also propose a method to find inheritance hierarchies in the fuzzy-valued concept networks. Then, we further extend the definition of fuzzy-valued concept networks to allow the relevant values between concepts to be represented by fuzzy numbers of arbitrary shapes and propose an information retrieval method based on this kind of fuzzy-valued concept networks. Moreover, we also proposed the definition of multi-relationship fuzzy concept networks, where a concept can be related to another concept by multiple kinds of fuzzy relationships simultaneously. The multi-relationship fuzzy concept networks are very useful in fuzzy information retrieval systems for document retrieval. Furthermore, in order to reduce the effort of constructing multi-relationship fuzzy concept networks, we also propose a method to automatically construct the multi-relationship fuzzy concept networks based on training documents. In this dissertation, we also present a new method for fuzzy information retrieval based on fuzzy hierarchical clustering and fuzzy inference techniques. Firstly, we propose a fuzzy agglomerative hierarchical clustering algorithm for clustering documents and to get the document cluster center of each document cluster. Then, we propose a method to construct fuzzy logic rules based on the document clusters and their document cluster centers. Finally, we apply the constructed fuzzy logic rules to modify the user’s query for query expansion and to guide the information retrieval systems to retrieve more documents which are relevant to the user’s request. Finally, we present a new method for fuzzy information retrieval based on document terms reweighting techniques. The proposed method modifies the weights of document terms in document descriptor vectors based on the user’s relevance feedback. After modifying the weights of terms in document descriptor vectors, the degrees of satisfaction of relevant documents with respect to the user’s query will increase, and the degrees of satisfaction of irrelevant documents with respect to the user’s query will decrease. Then, the modified document descriptor vectors can be used as personal profiles for future query processing.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT910394006
http://hdl.handle.net/11536/70178
Appears in Collections:Thesis