標題: 名片欄位之辨識
Identifying Items from Business Cards
作者: 陳鳴遠
Ming-Yuan Chen
李錫堅
Dr. Hsi-Jian Lee
資訊科學與工程研究所
關鍵字: 名片;辨識;Item Identification;scorebook;business card
公開日期: 1998
摘要: 在今日人與人之間互相接觸的過程中,名片是一項很重要的工具。而為了要讓我們能容易地管理自己所有的名片,在這篇論文裡我們設計了一個系統使得名片之中的各個欄位都能夠自動地被辨識出來。在我們的系統中,我們處理的對象是已經經過前處理且名片中所有文字經過光學字元辨識(OCR)引擎辨認後的名片影像。 我們可以根據名片中內容的共通性而把其中所有的欄位分成主要欄位和次要欄位兩大類。我們分析名片中欄位的排列方式,依據它們的特性建立出欄位辨識的規則。每處理一張名片,系統會建立出屬於這張名片的記分簿(scorebook),記錄名片中的某個區塊依據欄位辨識規則計算出在每個欄位情況下所得到的分數。計算出名片中所有區塊在每個欄位得到的分數後,就能根據分數高低辨識出名片欄位。辨識出欄位之後,我們還要做一些後處理,包括若一個區塊包含兩個以上的欄位的話要做切割、建立關鍵詞資料庫修正欄位內容、檢查欄位中與關鍵字相鄰的字元,若是不應該出現的字元就加以重新辨認、最後要根據關鍵詞來分配那些沒有被辨認出欄位的區塊,若連關鍵詞的資訊都沒有就將這些區塊分配到備註(note)欄。 我們用中文橫式、直式名片各100張來做實驗。所得到的欄位辨識率是93.05%,其中橫式與直式名片的欄位辨識率分別是92.99%以及93.91%。
A business card is an important tool for people to contact each other today. In this thesis, we design a system to identify the items of a business card automatically so that we can manage all business cards more easily. In our system, we will process a business card image which has been pre-processed, and all characters in this business card have been recognized by OCR engines. We first classify all items of the business card into two classes, the major items and the minor items, according to the commonality of the contents in business cards. We analyze the arrangement of the items and build item identification rules according to the characteristics of each item. For each business card, we build a scorebook to record the scores. When we examine a block in the business card, we consider the rules of each item and evaluate the score which the block gets in this item. After we evaluate the scores of all blocks, we can identify each item in the business card. In the post-processing steps, we split the blocks that contain more than one item. We build the keyword databases to revise the contents of the items and check the characters adjacent to the key-characters and re-recognize improper characters. At last, we identify undecided blocks according to the keywords of each item and identify the blocks that have no keywords as the note items. In our experiments, we use 100 horizontal cards and 100 vertical cards to test the system of item identification for Chinese cards. The accuracy rate of item identification is 93.05%. The accuracy rates for item identification of horizontal cards and vertical cards are 92.29% and 93.91%.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT870392023
http://hdl.handle.net/11536/64043
顯示於類別:畢業論文