Title: 多值化的地圖文字抽取
Multi-thresholding Character Extraction in a Map
Authors: 楊希明
Hsi-Ming Yang
李錫堅
Hsi-Jian Lee
資訊科學與工程研究所
Keywords: 地理資訊系統; 多值化; 模糊化;Geographical Information System; Multi-thresholding; Blurring;
Issue Date: 1994
Abstract: 在地理資訊系統中, 地圖提供諸如:國家、城市、河流等許多有用的資訊 如何將地圖中的資訊自動地抽取出來, 建立成有用的資料庫系統,乃是本 研究的目標。要將地圖中的資訊存入電腦中,文字抽取是其中一項基本的 工作。由於系統的輸入是灰階地圖, 我們提出一個多值化的地圖文字抽取 方法。此方法包含兩個階段。在第一階段中, 我們從以多個臨界值來表示 的地圖中抽取中文字。地圖二值化以後, 我們基於不同的臨界值重覆地進 行文字抽取。文字抽取包含三個步驟, 分別是模糊化、相連元件抽取以及 旋轉角度偵測。一旦文字被抽取出來, 將由統計式文字辨認模組去辨識, 也將由地圖中去除。在第二階段中, 我們對餘留地圖以群長方法抽取單純 元件,以去除可能是道路線條的較長元件。接下來再利用第一階段的文字 抽取步驟來抽取文字。抽取出來的文字同樣送入辨認模組去辨識。我們的 測試樣本地圖包含 571個中文字, 其中 471個被正確地抽取出來, 我們的 系統抽出率為82.31%。 Maps provide many pieces of information such as countries, cities, rivers, etc, which are useful to human beings in geographic information system(GIS). How to extract the infor- mation automatically from a map to build a data base for user retrieval ia one of the goals of a GIS. Character extraction is one of the essential tasks for entering the map information into a computer. Because the input to the system is a grayscale map, we propose a method to extract Chinese characters in the map via the multi-thresholding scheme. It consists of two phase. In the first phase, we extract the Chinese characters from a map, which is represented by multi-level values. After performing binariza- tion, the character extraction operations containing three processes, named blurring, connected component extraction and rotation angle detection are conducted repeatively based on di- fferent thresholds. Once characters are extracted from the map, they are sent to a statistical- based character recognition module and substracted from the map. In the second phase, we extract simple components from the remained map by the run-length method to remove long components, which may be road lines. Then charac- ter extraction operations used in the first phase are performed again to extract characters. These extracted characters are also sent to the recognition module. Our testing sample maps contain 571 Chinese characters. Among them, 471 Chinese characters are correctly extracted out. The extraction rate for our system is 82.31%.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT830392035
http://hdl.handle.net/11536/58957
Appears in Collections:Thesis