Title: 地圖中機關名稱之辨識
Recognition of Institution Names in Maps
Authors: 周澤安
Zhou, Ze-An
李錫堅
Hsi-Jian Lee
資訊科學與工程研究所
Keywords: 詞尾;辨認模組;循環連續字組;phase-suffix;recognition module;circulated consecutive characters
Issue Date: 1995
Abstract: 在地理資訊系統中,地圖提供諸如﹕建築物、道路、河流等許多有用的資
訊。本研究將自動擷取地圖中機關的資訊自動地截取出來,建成有用的資
料庫系統。要將地圖中建築物的資訊存入電腦中,可分成兩個步驟,首先
先將文字抽取出並送入辨認模組 第二步則需對辨識結果加以分析,將正
確建築物名稱輸出。文字影像抽取的工作以完成,在文字的位置及大小為
已知的條件下,在文字影像被送入辨認模組前,仍需要做些處理,包括旋
轉角度偵測和文字影像的抽取。我們運用地圖中機關名稱方向和文字平行
或垂直的特性將旋轉角算出。在文字影像抽取方面,我們根據地圖中的色
彩特性,依序將背景及非文字影像去除,最後才將文字影像抽出。一旦文
字被抽取出來,將由統計式文字辨識模組去辨識。在第二階段中,我們運
用機關名稱中詞尾出現的位置來分開距離相近的建築物名 稱、排列順序
等。我們使用前十名候選字做為第二階段輸入,以期適度的修正錯誤辨識
結果,及利用詞的循環連續字組建立詞典結構加快執行的速度。我們測試
樣本地圖八張,文字最初辨識率為81.02%,經過第二階段處理辯識率增加
為91.54%。另外,測試樣本地圖中共有112個機關名稱,其中82個最後輸
出完全正確結果,正確率為74%。
Maps provide many pieces of information such as building names,
road names, rivers, etc, which are useful to human beings in
geographic information system(GIS). This research will extract
the information about institution names automatically from a map
to build a database for user retrieval. There are two main steps
for entering the information of institution names in maps into a
computer. First, characters are extracted from the map image and
send to the character recognition system. Second, the
recognition results are analyzed and the correct institution
names are selected. Under the assumption that character
extraction has been completed, the size and position of
characters in a map is known. However, we must process the
character images before they are sent to a character recognition
system. The processes include rotation angle detection,
character extraction and preprocessing for a character
recognition system. We usethe feature that the direction of a
institution name is parallel to or perpendicular with the
direction of characters to compute the rotation angle of
characters in a institution name. For character extraction, we
apply the color feature of the map to remove background and
noncharacters. Finally, the correct characters are extracted by
deleting those portions of other characters. After the
characters are extracted, they are recognized by a statical
character recognition system. In the second step, we apply the
position of phrase-suffix in institution names to separate close
institution names and decide their arrangement order. We use the
top 10 candidates as the input of this step for selecting the
most possible recognition results. A new structure of the
dictionary isconstructed, which is indexed by circulated
consecutive characters, for speeding up the look-up process. In
the experiments, eight maps are tested and the recognition rate
of characters is 81.02%. After the processes of understandingthe
institution names, the correct rate raises to 91.54%. For all
112 institution names in these maps, 82 institution names are
completely identified, which gives a 74% identification rates of
the institution names.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT840392040
http://hdl.handle.net/11536/60384
Appears in Collections:Thesis