以知識為本的中文字部首抽取

標題:	以知識為本的中文字部首抽取 Knowledge-based Radical Extraction for Handwritten Chinese Characters
作者:	曾逸鴻 Yi-Hong Tseng 李錫堅 Hsi-Jian Lee 資訊科學與工程研究所
關鍵字:	知識;筆劃抽取;參考部首;動態規劃;候選部首;後檢查;最大完整圖;;Knowledge;Stroke extract;Candidate,Reference radical; Dynamic programming;Post-checking;Max clique;
公開日期:	1993
摘要:	本論文的主要目的是利用一些和部首相關的知識，來加速並提升辨識率。在本論文中，假設我們已接收一個不完美的筆劃抽取結果，若能將組成部首正確無誤地抽取出來，必可大大地簡化中文字的辨識工作。首先，我們定義了約四佰個部首來涵蓋超過二仟個中文字。經由實驗與觀察，我們加入了一些和這些參考部首有關的知識，來表示它們結構上的性質。當一個二維的中文字影像經過簡單的部首切割，以及包含筆劃抽取的前處理過程後。在利用動態規劃作比對之前，先利用部首可能出現的位置與筆劃的存在性作前選擇的工作，挑選適合的筆劃與參考部首模式參與比對。針對比對完成之後得到的候選部首，再利用三種後檢查方法：筆劃相對長度檢驗、部首的筆劃聚集性和部首重疊性檢驗，來檢驗其合法性，去除不合法者。最後，以剩下的合法候選部首建構一無向圖，利用找最大完整圖的方法，找到最佳的組合，即可抽出正確的部首。本系統同時也應用數位板輸入的方式，由電腦自動地建構每個參考部首模式，並加入知識庫中的適當位置。這對於日後的擴充性有很大助益。在實驗中用來辨識的手寫中文字，是由CCL/HCCR1手寫中文字資料庫任意地選取出來。經由實驗結果，證實了本論文部首抽取方式的可行性。 The goal of this thesis is to speed up the execution and increase the efficiency of character recognition by using some knowledge relative to radicals. In this thesis, we assume that we have accepted an imperfect result of stroke extraction. If the correct radicals can be extracted successfully, the task of character recognition will be simplified greatly. First, we defined about 400 radicals that can compose more than 2000 Chinese characters. According to the experimental results and our observation, we summarize some knowledge to decribe the structural properties of those reference radical models. In the recognition process, 2-D Chinese characters would be first processed by radical separation and stroke extraction. Next, we use the possible positions at which radicals locate and the stable stroke types to pre-select suitable extracted strokes and reference radical models before the radical matching process that is a dynamic programming method. Then, there are three post-checking methods: the related stroke length checking, the convex hull checking, and the radical overlap checking are used to check the legality of all candidate radicals and remove the illegal ones. Finally, we can extract the correct radicals by finding the maximum clique of a undirected graph constructed by the legal candidate radicals. The input method utilizing a tablet is also applied in our system. Computers can generate each reference radical model automatically according to the on-line information extracted from a tablet, and insert it into the proper position of the knowledge base. The extension of radical database can be achieved easily. The testing Chinese characters are selected from the database CCL/HCCR1 and the experimental results show the feasibility of the radical extraction method proposed in this thesis.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT820392063 http://hdl.handle.net/11536/57871
顯示於類別：	畢業論文