多值化的地圖文字抽取

標題:	多值化的地圖文字抽取 Multi-thresholding Character Extraction in a Map
作者:	楊希明 Hsi-Ming Yang 李錫堅 Hsi-Jian Lee 資訊科學與工程研究所
關鍵字:	地理資訊系統; 多值化; 模糊化;Geographical Information System; Multi-thresholding; Blurring;
公開日期:	1994
摘要:	在地理資訊系統中, 地圖提供諸如:國家、城市、河流等許多有用的資訊如何將地圖中的資訊自動地抽取出來, 建立成有用的資料庫系統,乃是本研究的目標。要將地圖中的資訊存入電腦中,文字抽取是其中一項基本的工作。由於系統的輸入是灰階地圖, 我們提出一個多值化的地圖文字抽取方法。此方法包含兩個階段。在第一階段中, 我們從以多個臨界值來表示的地圖中抽取中文字。地圖二值化以後, 我們基於不同的臨界值重覆地進行文字抽取。文字抽取包含三個步驟, 分別是模糊化、相連元件抽取以及旋轉角度偵測。一旦文字被抽取出來, 將由統計式文字辨認模組去辨識, 也將由地圖中去除。在第二階段中, 我們對餘留地圖以群長方法抽取單純元件,以去除可能是道路線條的較長元件。接下來再利用第一階段的文字抽取步驟來抽取文字。抽取出來的文字同樣送入辨認模組去辨識。我們的測試樣本地圖包含 571個中文字, 其中 471個被正確地抽取出來, 我們的系統抽出率為82.31%。 Maps provide many pieces of information such as countries, cities, rivers, etc, which are useful to human beings in geographic information system(GIS). How to extract the infor- mation automatically from a map to build a data base for user retrieval ia one of the goals of a GIS. Character extraction is one of the essential tasks for entering the map information into a computer. Because the input to the system is a grayscale map, we propose a method to extract Chinese characters in the map via the multi-thresholding scheme. It consists of two phase. In the first phase, we extract the Chinese characters from a map, which is represented by multi-level values. After performing binariza- tion, the character extraction operations containing three processes, named blurring, connected component extraction and rotation angle detection are conducted repeatively based on di- fferent thresholds. Once characters are extracted from the map, they are sent to a statistical- based character recognition module and substracted from the map. In the second phase, we extract simple components from the remained map by the run-length method to remove long components, which may be road lines. Then charac- ter extraction operations used in the first phase are performed again to extract characters. These extracted characters are also sent to the recognition module. Our testing sample maps contain 571 Chinese characters. Among them, 471 Chinese characters are correctly extracted out. The extraction rate for our system is 82.31%.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT830392035 http://hdl.handle.net/11536/58957
顯示於類別：	畢業論文