標題: | 非監督式中文寫作自動評閱系統之研究與設計 Unsupervised Chinese Essay Scoring System---Study, Design and Implementation |
作者: | 李嘉晃 LEE CHIA-HOANG 國立交通大學資訊工程學系(所) |
公開日期: | 2008 |
摘要: | 寫作能力是一種綜合訓練,既是語言文字的訓練,同時也是思維能力的訓練。藉由寫作
過程,可以訓練一個學生的思考、理解、推理、及創作等能力;同時亦可檢視該學生是
否已理解本國語言文字並能加以靈活應用。在日常生活中,舉凡書信、公文、簡報、履
歷表、及婚喪喜慶等各類應酬文書等,皆屬作文的範疇,絕大部份的人或多或少都會接
觸到作文,由此更突顯出作文教學的必要性。
有效率的評閱寫作作品對於數位學習、心理計量、教育測驗等領域是一項非常重要的研
究課題。由於人工評閱的成本過高,又不易達成測驗所需的評分者信度,因此如何利用
機器及人工智慧技術協助評閱寫作作品的自動寫作評閱(Automated Essay Scoring,AES)
技術被視為重要的解決方案。在1990 年代,隨著自然語言處理技術的長足進步,具有
高正確率的系統陸續問世,且被應用於大型入學測驗及寫作教學。
然而根據英漢對比修辭學及我們先前研究成果均顯示,直接應用現有AES 於中文寫作有
相當大的困難。因此我們先前發展了一個自動中文寫作評閱系統(Chinese Automated
Essay Scoring, CAES)。然而這個原型系統距離實際應用仍有一段距離,其主要困難有
二。第一、系統需要有一定數量的人工評分訓練樣本。第二、系統存在被惡意欺騙的風
險。目前大部分的AES 系統都需要相當數量的人工評分訓練樣本,導致系統的導入需
要事先的大量人力來評分,再經由系統學習,才能夠真正開始做評分。因此一個不需要
事先評分樣本的系統,將可以更符合實際評分系統的需求。至於惡意欺騙問題,更是要
讓該系統可以真正應用到實際使用的一項重要因素。
我們預期透過本計畫,提升現有CAES 原型系統的效能及應用性,希望以非監督式學習
方式解決事先大量人力評分的問題,以及透過更多種類特徵以及偏離主題偵測來解決惡
意欺騙的問題。這個研究計畫的結果,可以為眾多領域例如數位學習、心理計量、教育
測驗等提供可實際應用的、高效能的研究工具。此研究將針對原有CAES 系統的三個部
份進行強化與功能提升: 第一是提出如何以非監督式學習方式建立CAES 預測模型的方
法,第二是提出如何偵測偏離主題的新方法,第三是提出更多種類的特徵,特別是目前
原型系統中欠缺的結構特徵。我們預期經此計劃強化後的系統能進入實際運轉階段,也
能為競爭日益激烈的海外中文學習市場提供有力的競爭工具。 Automatic essay scoring (AES) system is a very important research tool for such areas as educational testing and psychometrics because studies in these domains often rely on a large number of writings to conduct various analyses. It is, however, often very difficult to obtain a large number of graded writings due to expensive cost and time consuming process of human grading. In English, the successful development of automatic essay scoring system in the past has overcome these limitations and largely facilitated the progress of the stated research area. By contrast, the lack of Chinese automatic essay scoring system (CAES) has limited the scale, quantities, and validity of these research areas. The linguistic difference of the languages between Chinese and English suggest the need to reconsider various issues when designing CAES. Hence it is difficult to apply the current techniques of English AES systems to Chinese writings. We have developed a Chinese automated essay scoring system in the past which is far from practical application in the real world. Two main difficulties exist: (1) The system needs a large number of training data with score. (2) There exists a risk that hostile user might trick the system so that a bad writing might score very well through the weakness of the system. The aim of the study is to propose and to develop second generation of our Chinese automatic essay scoring system so that it can be used in the area of educational research. Our proposal will focus on (1) Develop an unsupervised machine learning model for CAES system, (2) Develop a method for detecting writings with digressing topics, (3) Design versatile and various features, in particular structural and semantic features. We expect that the proposed system will become a powerful tool for learning Chinese writings |
官方說明文件#: | NSC97-2221-E009-135 |
URI: | http://hdl.handle.net/11536/102770 https://www.grb.gov.tw/search/planDetail?id=1688045&docId=291085 |
Appears in Collections: | Research Plans |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.