標題: | Design of a mathematical expression understanding system |
作者: | Lee, HJ Wang, JS 交大名義發表 資訊工程學系 National Chiao Tung University Department of Computer Science |
關鍵字: | character segmentation;character recognition;expression formation;error correction |
公開日期: | 1-Mar-1997 |
摘要: | A scientific document usually consists of text and mathematical expressions. In this paper, we present a system for segmenting and understanding text and mathematical expressions in a document, The system can be divided into six stages: page segmentation and labeling, character segmentation, feature extraction, character recognition, expression formation, and error correction and expression extraction. After we extract all text lines in a document, we separate all symbols in each text line and calculate direction-feature vectors and aspect ratios for those symbols. Then, a nearest-neighbor algorithm recognizes characters. In the expression formation stage, we build a symbol relation tree for each text line that represents the relationships among the symbols in the text line. Each text line is decomposed into a collection of primitive tokens: operands, operators and separators. Heuristic rules based on these primitive tokens are used to correct text recognition errors. Finally, we extract all mathematical expressions according to basic expression forms. Several pages of documents were scanned to test the method. All mathematical expressions are understood. In the expressions generated, a few symbols are misrecognized. The average recognition rate was 96.16%. (C) 1997 Elsevier Science B.V. |
URI: | http://hdl.handle.net/11536/695 |
ISSN: | 0167-8655 |
期刊: | PATTERN RECOGNITION LETTERS |
Volume: | 18 |
Issue: | 3 |
起始頁: | 289 |
結束頁: | 298 |
Appears in Collections: | Articles |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.