以自動產生的文字筆序結構碼進行手寫中文字辨識

標題:	以自動產生的文字筆序結構碼進行手寫中文字辨識 Handwritten Chinese Character Recognition Based on Automatically Generated Stroke Structural Sequence Codes
作者:	李其瑋 Lee, Chi-Wei 陳稔 Zen Chen 資訊科學與工程研究所
關鍵字:	手寫中文字;文字辨識;筆序;筆序結構碼;特徵變動法;Handwritten Chinese character;Character recognition;Stroke sequence;Stroke structural sequence code;Feature perturbation
公開日期:	1995
摘要:	本論文探討如何從手寫中文字產生文字筆序結構碼並將其應用於中文字辨識。本文處理對象以線上輸入的手寫中文字為主，將來配合適當的筆劃抽取程序後，亦可用於處理離線手寫中文字。本論文包含三個部份，第一部份提出一套能對中文字自動排出筆序的筆劃排序規則。此規則不含特殊的部首知識或文字區塊分佈知識，易於應用在電腦上，所得筆序亦和字典的標準筆序相同或相當接近。為容忍書寫變化，我們取出文字筆序結構碼中不易受書寫影響的部份做為大分類碼，此大分類碼即可用於手寫中文字的大分類。本文第二部份提出筆劃特徵變動法來處理手寫中文字的變化。本方法的基本概念是猜測可能發生錯誤的特徵並將特徵還原回正確值，以期能修正待辨樣本的錯誤特徵，得到正確的辨識結果。筆劃特徵變動法能應用在待辨樣本上，修正因書寫變化而錯誤之特徵。此外，也能應用在字庫上，自動產生文字的各種變化代表樣本。本文第三部份是在大分類後的群集中尋找適宜的特徵產生文字細比對碼。文字中具有穩定一致性的特徵組合成正細比對碼，字庫的每個文字以一個或數個正細比對碼來代表。另外為使細比對碼數量盡量少，可用負細比對碼排除相似的其他文字。文字辨識過程含三個階段：大分類、細比對、和筆劃特徵變動。待辨樣本先以其大分類碼分類到某一群集，再比對群集中各文字的細比對碼。若能符合某字細比對碼，則完成辨識；若不能符合各自細比對碼，則以筆劃特徵變動法產生新的筆序結構碼及大分類碼，再重新辨識。我們以各項實驗驗證本文所提的方法，實驗結果顯示本文方法能處理手寫中文字因書寫變化在大分類及細比對時遭遇之問題，並獲得良好的辨識效果。 The dissertation is concerned with automatic generation of thestroke structural sequence codes from the handwritten Chinesecharacters and its use in the recognition problem. In the currentimplementation, we consider primary the on-line handwritten Chinesecharacter recognition problem, although the system can be also usedfor off- line characters if the character strokes are extracted inadvance. The dissertation consists of three parts. In the first part,a set of rules for stroke ordering in order to produce a unique strokesequence for Chinese characters is proposed. It requires no specialradical knowledge or knowledge of character block layout forms, so itis easy for machine implementation. Besides, the stroke sequencesderived are similar to those given in the dictionary, if not the same.To deal with the writing variations among writers, we generalize thederived stroke structural sequence code to obtain a more consistentstroke information. This generalized stroke structural sequence codeis then shown to be used in the handwritten Chinese characterpreclassification. In the second part, we propose a method based on the perturbationtechnique to handle the variations in handwritten Chinese characters.The basic concept of our perturbation technique is to recover thepossible erroneous stroke features by replacing them with the newperturbed feature values such that the resultant character featuresmay become normal. The perturbation technique can be applied to theinput samples to recover the possible erroneous stroke features due tothe handwriting variations. It can also be applied to the referencecharacters in the database to enlarge the database. In the third part, we design the detailed matching code by usingthe new consistent stroke features other than those used in thepreclassification code. After the preclassification, each clustercontains one or more characters. The character in a cluster is calleda legitimate character if the preclassification code associated withthe cluster is the right one. If a particular character sample hassome erroneous features, then its preclassification code is notcorrect. We only need to design the detailed matching codes for thoselegitimate characters in the cluster. The requirements for designingthe detailed matching codes are the codes must satisfy thecompleteness and consistency conditions. We use one or more positivecodes to meet the completeness condition and, if possible, theconsistency condition as well. However, if the consistency conditionis not satisfied by positive code(s), then the negative codes arecreated. The overall process for handwritten Chinese character recognitionconsists of three stages: preclassification, detailed matching, andstroke feature perturbation. Each input character sample is classifiedfirst based on its preclassification code. Once it is grouped into apreclassification cluster, it is further matched based on its strokestructural sequence code against the positive and negative codes ofthe legitimate characters in the cluster. If a match is found, therecognition process terminates; if not, a proposed stroke featureperturbation technique will be applied to the input character toobtain a perturbed stroke structural sequence code and a newrecognition cycle is repeated. The process ends when a match is foundor no more new perturbed codes are possible. Experiments are included to evaluate the ideas mentioned above.The results indicate that the proposed methods can handle thepreclassification and detailed matching problems caused by handwritingvariations among writers and, thus, achieves a high recognition ratefor the difficult handwritten Chinese character recognition problem.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT840392086 http://hdl.handle.net/11536/60434
顯示於類別：	畢業論文