標題: 應用掃瞄細線化演算法於中文文字辨識系統
Line Sweep Thinning Algorithm and its Application to Chinese Character Recognition
作者: 莊富任
Fu-Zen Chuang
梁 婷
張 復
Tyne Liang
Fu Chang
資訊科學與工程研究所
關鍵字: 細線化;掃瞄細線化演算法;文字特徵擷取;中文文字辨識;Thinning;Line Sweep Thinning Algorithm;Character Feature Extraction;Chinese Character Recognition
公開日期: 1998
摘要: 文字辨識是一個相當關鍵的技術,可自動將文件轉換成電腦可存取的形式,使我們能更方便透過像是網路的方式去擷取有用的資訊。然而,想將形式多變的文件和文字正確無誤的轉換成電子檔,仍然存在一些困難。在本論文中,我們應用掃瞄細線化演算法於中文文字辨識上,以取得文字骨幹和筆畫交叉的訊息。並利用像素編碼與區域方法來求出特徵向量。在學習和分類上,我們提出一個反覆式多層次架構的學習模組,來產生一組多元字典,以達到多種型態的字型辨認。實驗的結果也顯示出在不同的字型下,像是明體、細明體、粗黑體、仿宋體以及楷體上都能達到不錯的辨識效果。
Character recognition is a key technique for the automatic conver-sion of paper documents into computer-readable codes, facilitating in-formation acquisition through various means such as internets or intra-nets. However, it remains a difficult task to achieve near perfection in the transformation of various types of characters from printed docu-ments. In this thesis, line sweep thinning algorithm is applied to Chine-se character recognition to obtain skeletons and intersection sets. From the information, the labeling of pixels and zoning method are imposed to extract feature vectors. In the training process, we propose an itera-tive multi-layer training model to generate a set of multiple-entry dic-tionary for multi-font character recognizer. The experimental results also show that it can achieve good performance in different fonts in-cluding Ming, Thin Ming, Black, Song, and Kai.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT870394074
http://hdl.handle.net/11536/64217
顯示於類別:畢業論文