標題: 以成語涵義為基礎之中文成語檢索系統
Chinese Idiom Information Retrieval System Based on the Idiom Semantics
作者: 張正霖
Chang, Chen-Lin
Hwang, Ming-Jiu
關鍵字: 成語檢索;資訊擷取;關鍵詞權重;查詢擴展;層面分類;修訂查詢;idiom search;information retrieval;term weight;query extension;facet query;revised query
公開日期: 2009
摘要: 目前成語檢索系統的查詢功能主要包括:單一關鍵詞釋義查詢、字詞查詢、類別查詢、首字部首查詢、以及首字拼音查詢。但使用者查詢成語時,往往僅知其「成語涵義」,不知成語的字詞,使得使用者較無法查得所需成語。本研究建置一套以成語涵義為基礎的成語檢索系統-MIRS(Meaning of Idiom Retrieval System)來解決上述成語檢索系統之問題。MIRS包括:成語資料前置處理,查詢問句處理,檢索處理,以及結果顯示等四大模組。使用者輸入簡單的口語化查詢問句,利用擴展查詢與增加關鍵詞權重方法,即能更精準找出成語。系統提供查詢結果的關鍵詞統計與分類,讓使用者透過層面分類查詢(Facet Query)與修訂查詢(Revised Query)功能亦可有效找到成語,另一方面,本系統引進Web 2.0概念,讓使用者提供同義詞和成語釋義的建議資料,進而更提升系統查詢效益。由系統的評估發現,本系統所提供的功能,讓使用者選擇最適合的檢索方法,不但查詢功能更友善,而且結果更精準。
At present, the search functions of Chinese idiom retrieval systems include single keyword search, searching for character, searching for category, and searching for radical, pinyin, stroke number in first character of idiom. Users are requested to input these query items into idiom retrieval systems for search. However, users always remember the meaning of Chinese idiom but not idiom text when they want to search. The aim of this study was to construct a Chinese idiom information retrieval system base on the meaning of idiom (Meaning of Idiom Retrieval System, MIRS), and to solve these questions described as above. MIRS contains four models which are pre-processing of idiom content, query processing, and retrieval processing, and exhibition of query outcome. User inputted oral questions with a simple query, MIRS can more accurately find the idioms by handling query extension and increasing keywords weight. System also can effectively find the idiom by counting and classifying keywords of searched results, and then working "Facet Query" and "Revised Query". In addition, MIRS also builds the conception of Web 2.0 that users can provide synonyms and recommend meanings information, in order to increase the efficiency of MIRS. According to the evaluation of MIRS, we found that MIRS is friendly to use and gives users one choice of adaptable retrieval systems to acquire precise queried information.


