Title: 以自動產生的文字筆序結構碼進行手寫中文字辨識
Handwritten Chinese Character Recognition Based on Automatically Generated Stroke Structural Sequence Codes
Authors: 李其瑋
Lee, Chi-Wei
陳稔
Zen Chen
資訊科學與工程研究所
Keywords: 手寫中文字;文字辨識;筆序;筆序結構碼;特徵變動法;Handwritten Chinese character;Character recognition;Stroke sequence;Stroke structural sequence code;Feature perturbation
Issue Date: 1995
Abstract: 本論文探討如何從手寫中文字產生文字筆序結構碼並將其應用於中文
字辨識。本文處理對象以線上輸入的手寫中文字為主,將來配合適當的筆
劃抽取程序後,亦可用於處理離線手寫中文字。本論文包含三個部份,第
一部份提出一套能對中文字自動排出筆序的筆劃排序規則。此規則不含特
殊的部首知識或文字區塊分佈知識,易於應用在電腦上,所得筆序亦和字
典的標準筆序相同或相當接近。為容忍書寫變化,我們取出文字筆序結構
碼中不易受書寫影響的部份做為大分類碼,此大分類碼即可用於手寫中文
字的大分類。 本文第二部份提出筆劃特徵變動法來處理手寫中文字的
變化。本方法的基本概念是猜測可能發生錯誤的特徵並將特徵還原回正確
值,以期能修正待辨樣本的錯誤特徵,得到正確的辨識結果。筆劃特徵變
動法能應用在待辨樣本上,修正因書寫變化而錯誤之特徵。此外,也能應
用在字庫上,自動產生文字的各種變化代表樣本。 本文第三部份是在
大分類後的群集中尋找適宜的特徵產生文字細比對碼。文字中具有穩定一
致性的特徵組合成正細比對碼,字庫的每個文字以一個或數個正細比對碼
來代表。另外為使細比對碼數量盡量少,可用負細比對碼排除相似的其他
文字。文字辨識過程含三個階段:大分類、細比對、和筆劃特徵變動。待
辨樣本先以其大分類碼分類到某一群集,再比對群集中各文字的細比對碼
。若能符合某字細比對碼,則完成辨識;若不能符合各自細比對碼,則以
筆劃特徵變動法產生新的筆序結構碼及大分類碼,再重新辨識。 我們
以各項實驗驗證本文所提的方法,實驗結果顯示本文方法能處理手寫中文
字因書寫變化在大分類及細比對時遭遇之問題,並獲得良好的辨識效果。
The dissertation is concerned with automatic generation of
thestroke structural sequence codes from the handwritten
Chinesecharacters and its use in the recognition problem. In the
currentimplementation, we consider primary the on-line
handwritten Chinesecharacter recognition problem, although the
system can be also usedfor off- line characters if the character
strokes are extracted inadvance. The dissertation consists of
three parts. In the first part,a set of rules for stroke
ordering in order to produce a unique strokesequence for Chinese
characters is proposed. It requires no specialradical knowledge
or knowledge of character block layout forms, so itis easy for
machine implementation. Besides, the stroke sequencesderived are
similar to those given in the dictionary, if not the same.To
deal with the writing variations among writers, we generalize
thederived stroke structural sequence code to obtain a more
consistentstroke information. This generalized stroke structural
sequence codeis then shown to be used in the handwritten Chinese
characterpreclassification. In the second part, we propose a
method based on the perturbationtechnique to handle the
variations in handwritten Chinese characters.The basic concept
of our perturbation technique is to recover thepossible
erroneous stroke features by replacing them with the
newperturbed feature values such that the resultant character
featuresmay become normal. The perturbation technique can be
applied to theinput samples to recover the possible erroneous
stroke features due tothe handwriting variations. It can also be
applied to the referencecharacters in the database to enlarge
the database. In the third part, we design the detailed
matching code by usingthe new consistent stroke features other
than those used in thepreclassification code. After the
preclassification, each clustercontains one or more characters.
The character in a cluster is calleda legitimate character if
the preclassification code associated withthe cluster is the
right one. If a particular character sample hassome erroneous
features, then its preclassification code is notcorrect. We only
need to design the detailed matching codes for thoselegitimate
characters in the cluster. The requirements for designingthe
detailed matching codes are the codes must satisfy
thecompleteness and consistency conditions. We use one or more
positivecodes to meet the completeness condition and, if
possible, theconsistency condition as well. However, if the
consistency conditionis not satisfied by positive code(s), then
the negative codes arecreated. The overall process for
handwritten Chinese character recognitionconsists of three
stages: preclassification, detailed matching, andstroke feature
perturbation. Each input character sample is classifiedfirst
based on its preclassification code. Once it is grouped into
apreclassification cluster, it is further matched based on its
strokestructural sequence code against the positive and negative
codes ofthe legitimate characters in the cluster. If a match is
found, therecognition process terminates; if not, a proposed
stroke featureperturbation technique will be applied to the
input character toobtain a perturbed stroke structural sequence
code and a newrecognition cycle is repeated. The process ends
when a match is foundor no more new perturbed codes are
possible. Experiments are included to evaluate the ideas
mentioned above.The results indicate that the proposed methods
can handle thepreclassification and detailed matching problems
caused by handwritingvariations among writers and, thus,
achieves a high recognition ratefor the difficult handwritten
Chinese character recognition problem.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT840392086
http://hdl.handle.net/11536/60434
Appears in Collections:Thesis