標題: | 以語料為主具容錯能力之中文注音輸入研究 The Study of a Corpus-Based Chinese Phonetic Input System With Error Tolerant Ability |
作者: | 王建邦 Chien-Pang Wang 梁婷 Tyne Liang 資訊科學與工程研究所 |
關鍵字: | 容錯;中文注音輸入;區段索引;結合力;Error Tolerant;Chinese Phonetic Input;Bucket Hashing Index;Binding Force |
公開日期: | 2000 |
摘要: | 一個中文注音輸入系統除了斷詞、選詞之外,還要能對使用者發音提供容錯機制。強健的容錯的能力,一方面可應用於個人化聲學辨識器之後,另一方面也可減少使用者後續的校正處理。本論文提出容錯性詞語比對搜尋法來處理大量資料的搜尋。這個方法主要建構在一個區段索引表(Bucket Hashing Index)的技巧,可以迅速的搜尋近音詞。我們藉由此搜尋方式,讓中文注音輸入系統不僅有使用者容錯功能,也符合即時反應的效果。藉由定義使用者的近似發音區域,達到發音容錯。配合使用容錯性詞語比對搜尋法搜尋近音詞。在挑詞上,我們運用字與字間結合成詞的訊息,分析最有可能的中文詞串輸出。最後,我們利用實際的語料測試系統,驗證我們所提出的模組。經我們分別以不同的容錯範圍,以及整合多語者的近似發音,得到的測試數據,證實我們提出的理論是可行的。 There are two major issues which should be concerned in designed a Chinese phonetic input system. One is user pronunciation error tolerant and the other is word selection. It is fast that a robust tolerance mechanism can be embedded with acoustic recognition system in order to reduce manual editing correction effect. In this thesis , such an error tolerant phrase search method is proposed to search data in a very large Chinese corpus. This method is based on bucket hashing index. It can search similar-pronunciation grams quickly. The similar pronunciation set is obtained from a voice recognizer and constructed with find connected component algorithm. On the other hand , the binding force information of bigrams as well as several heuristic set selection rules are incorporated in our proposed word selection module. Finally the performance of the phonetic input system is justified with a real corpus. Different comparison modules are designed to test the tolerance and speed of the system. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT890394075 http://hdl.handle.net/11536/66980 |
Appears in Collections: | Thesis |