標題: 基於SAO結構之中文專利文件自動摘要技術研究
Design and Study of Automated Text Summarization for Extracting SAO Structures from Chinese Patent Documents
作者: 劉翰卿
Han-Ching Liu
楊維邦
蒙以亨
Dr. Wei-Pang Yang
Dr. I-Heng Meng
資訊學院資訊學程
關鍵字: 自動摘要技術;中文專利文獻;SAO 結構;經驗法則;Automated Text Summarization;Chinese Patent Documents;SAO;Subject-Action-Object;Heuristic Rules
公開日期: 2004
摘要: 自動文摘的基本精神乃是將原始文件的內容經由電子計算機的演算處理後,自動萃鍊出足資代表全文內容的精華出來,以便縮短研讀的時間,進而提升工作的效率。 本研究試圖藉由英文的主詞、動詞與受詞(Subject-Action-Object;簡稱 SAO)結構句型為基礎,藉由一系列的分析、運算、處理等過程,自動判讀出專利文獻的全文內容並且取其精髓後將之匯集成為一簡明扼要的摘要內容,讓企業研發部門、專利工程師、產業分析師或智權人員毋需詳閱艱澀難懂的專利全文,便可快速掌握到專利文獻所欲描述之概念,以加速取得目標資訊。 在雛型系統實驗中,我們以十六篇攸關電子商務領域的專利文獻為實驗素材,將SAO結構句的概念應用於中文專利文獻摘要的擷取上。經效益評估後的結果顯示,我們所設計的概念(Concepts)及SAO結構句的擷取演算都有還不錯的表現。以整體平均來說 ,概念(Concepts) 擷取方面的召回率為95.34%,準確率為92.13%;而SAO結構句組擷取方面的召回率則為92.45 %,準確率為93.79%。
The basic idea of automated text summarization is that distilling the most important information from a source to produce an abridged version, in order to shorten the time to understand the original source and then improve the efficiency of the work.In this thesis, the research attempts to extract SAO Structures from a Chinese patent document based on the basic sentence patterns of English, one of which, for example, is Subject, Verb, and Object (namely, Subject-Action-Object; abbreviated as SAO). With a series of analysis, operation, complicated process,… etc., we could create a brief and concise summary for the document. In the experiment of the prototype, we use sixteen Chinese patent documents of e-commerce -related field as the experiment material, and apply the concept of one of SAO structures to the picking and fetching of the Chinese text summarization. The results which were evaluated seems to be satisfactory. On average, it is 95.34% recalling rate of extracting the aspects of the concepts and the rate of accuracy is 92.13%. In addition, average recalls of 92.45% and average precision of 93.79% were achieved respectively in SAOs.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009167572
http://hdl.handle.net/11536/63868
顯示於類別:畢業論文


文件中的檔案:

  1. 757201.pdf
  2. 757202.pdf
  3. 757203.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。