標題: 地圖中機關名稱之辨識
Recognition of Institution Names in Maps
作者: 周澤安
Zhou, Ze-An
李錫堅
Hsi-Jian Lee
資訊科學與工程研究所
關鍵字: 詞尾;辨認模組;循環連續字組;phase-suffix;recognition module;circulated consecutive characters
公開日期: 1995
摘要: 在地理資訊系統中,地圖提供諸如﹕建築物、道路、河流等許多有用的資 訊。本研究將自動擷取地圖中機關的資訊自動地截取出來,建成有用的資 料庫系統。要將地圖中建築物的資訊存入電腦中,可分成兩個步驟,首先 先將文字抽取出並送入辨認模組 第二步則需對辨識結果加以分析,將正 確建築物名稱輸出。文字影像抽取的工作以完成,在文字的位置及大小為 已知的條件下,在文字影像被送入辨認模組前,仍需要做些處理,包括旋 轉角度偵測和文字影像的抽取。我們運用地圖中機關名稱方向和文字平行 或垂直的特性將旋轉角算出。在文字影像抽取方面,我們根據地圖中的色 彩特性,依序將背景及非文字影像去除,最後才將文字影像抽出。一旦文 字被抽取出來,將由統計式文字辨識模組去辨識。在第二階段中,我們運 用機關名稱中詞尾出現的位置來分開距離相近的建築物名 稱、排列順序 等。我們使用前十名候選字做為第二階段輸入,以期適度的修正錯誤辨識 結果,及利用詞的循環連續字組建立詞典結構加快執行的速度。我們測試 樣本地圖八張,文字最初辨識率為81.02%,經過第二階段處理辯識率增加 為91.54%。另外,測試樣本地圖中共有112個機關名稱,其中82個最後輸 出完全正確結果,正確率為74%。 Maps provide many pieces of information such as building names, road names, rivers, etc, which are useful to human beings in geographic information system(GIS). This research will extract the information about institution names automatically from a map to build a database for user retrieval. There are two main steps for entering the information of institution names in maps into a computer. First, characters are extracted from the map image and send to the character recognition system. Second, the recognition results are analyzed and the correct institution names are selected. Under the assumption that character extraction has been completed, the size and position of characters in a map is known. However, we must process the character images before they are sent to a character recognition system. The processes include rotation angle detection, character extraction and preprocessing for a character recognition system. We usethe feature that the direction of a institution name is parallel to or perpendicular with the direction of characters to compute the rotation angle of characters in a institution name. For character extraction, we apply the color feature of the map to remove background and noncharacters. Finally, the correct characters are extracted by deleting those portions of other characters. After the characters are extracted, they are recognized by a statical character recognition system. In the second step, we apply the position of phrase-suffix in institution names to separate close institution names and decide their arrangement order. We use the top 10 candidates as the input of this step for selecting the most possible recognition results. A new structure of the dictionary isconstructed, which is indexed by circulated consecutive characters, for speeding up the look-up process. In the experiments, eight maps are tested and the recognition rate of characters is 81.02%. After the processes of understandingthe institution names, the correct rate raises to 91.54%. For all 112 institution names in these maps, 82 institution names are completely identified, which gives a 74% identification rates of the institution names.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT840392040
http://hdl.handle.net/11536/60384
顯示於類別:畢業論文