語言辨識與檢索在中文口語處理之研究

標題:	語言辨識與檢索在中文口語處理之研究 Automatic Identification and Indexing of Chinese Multilingual Spoken Messages
作者:	蔡偉和 Wei-Ho Tsai 張文輝陳信宏 Wen-Whei Chang Sin-Horng Chen 電信工程研究所
關鍵字:	中國方言辨識;口語系統;音律;隱藏式馬可夫模型;高斯混合雙連文模型;鑑別式訓練;分段/分群;Chinese dialect identification;spoken language system;phonotactics;hidden Markov model;Gaussian mixture bigram model;discriminative training;segmentation/clustering
公開日期:	2000
摘要:	本篇論文探討實現多語言口語系統之兩項重要技術，分別為自動辨識語者所使用的語言，以及自動組織儲存語料以供建立語言模型或快速檢索之用。在論文的第一部分中，我們針對中國方言所特有之語法、聲學與韻律等不同層次上的相關訊息進行分析與擷取，藉以發展有效的方言辨識技術。有關語法訊息的分析上，我們考慮發聲過程中口腔變化所決定之粗分音類其反應在各方言的不同發生頻率與前後相連關係，建立隱藏式馬可夫模型以捕捉音類序列的動態變化資訊。在韻律訊息的分析方面，有鑑於聲調為中國方言中用以表達詞彙意義的一項重要元素，其所造成的抑揚頓挫在各方言的發音特性上影響甚鉅。本論文藉由音高輪廓而求取方言所隱含於聲調變化的韻律支配法則，在單獨使用此一線索下仍取得良好的方言鑑別能力。另外，我們更進一步發展一種以複合式馬可夫鏈為核心的音節組態模型結構，將各項有助於區別方言的資訊整合於單一辨識架構中。經實驗效能測試，該系統對於台灣境內三種主要中國方言的最佳辨識精確度可達89.3%。除了探討方言辨識的準確性之外，本論文也考慮系統的可移植性。為使辨識架構得以適用於其他缺乏語文分析之語言或方言，我們提出一種基於高斯混合機率密度函數的雙連文語言模型，將語言蘊含於語音特徵參數之時間相依性直接納入模型的建構中。在另一方面，本論文的第二部分探討如何自動偵測混合語料的語言切換位置，以及如何將多語言語音依其所屬語言進行分群歸類，以減輕人工標示處理語料的繁重負擔。研究方法著重於發展不需事先取得語言特性及訓練語料之語音分段與分群機制。在語言切換的自動檢測上，我們應用高斯混合雙連文模型進行語音之語言聲學特性擷取，藉以量測相鄰語音區段之差異性，進而作為判定所屬語言是否相同的依據。有關自動語言分群方面，本論文提出一種基於向量分群的處理方案，較傳統階層式分群法更適合用於語言歸類的問題上。 This study focuses on two issues: dialect identification and spoken message indexing, which are necessary steps to design spoken language systems with the goal of multilingual information access. The first part of this study presents three approaches that employ varying degrees of linguistic traits to evaluate their relative contributions towards Chinese dialect identification. The first design approach was based on phonotactic analysis following phonetic tokenization, the second on pitch contour dynamics, and the third on a combination of segmental and prosodic features. The importance of incorporating prosodic information is due to the fact that Chinese syllables may have the same phonetic compositions, but different lexical meanings when spoken with different tones. Simulation results indicate that the proposed composite hidden Markov model is very effective in information integration, and use of this model can discriminate among three major Chinese dialects spoken in Taiwan with 89.3\% accuracy. Also proposed is a new stochastic model, Gaussian mixture bigram model (GMBM), that better characterizes the time correlation on acoustic feature frames. The main attraction of GMBMs arises from the fact that the observation used in dialect-specific modeling are extracted directly from the acoustic features; allowing us to estimate its model parameters without any transcription of training utterances. For greater efficiency, a minimum classification error algorithm is employed to accomplish discriminative training of a GMBM-based dialect identification system. The second part of this study addressed the general task of automatic indexing of spoken messages when no information is available regarding the language. This task was accomplished by partitioning the unlabeled speech messages into segments containing only one language and by grouping acoustically homogeneous segments into one-language clusters. Approaches to language-based segmentation are presented based on GMBM modeling of language acoustics in conjunction with different dissimilarity measurements. When dealing with the language clustering, the merits of using a new scheme based on vector clustering are explored as compared with conventional hierarchical clustering techniques.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT890435095 http://hdl.handle.net/11536/67374
Appears in Collections:	Thesis