標題: 以叢集為基礎的支撐向量機學習及其應用於語者辨識
The Cluster-based Learning of Support Vector Machines and Its Application in Text-Independent Speaker Identification
作者: 孫聖育
Sheng-Yu Sun
傅心家
Hsin-Chia Fu
資訊科學與工程研究所
關鍵字: 支撐向量機;支撐向量;語者辨識;分群;Support Vector Machine;Support Vector;Speaker Identification;clustering
公開日期: 2003
摘要: 擁有充分的統計學習理論基礎的支撐向量學習機(Support Vector Machine)在分類與辨識的問題上有相當好的表現水準,例如:圖型辨識(Pattern Recognition)和語者辨識(Speaker Identification)等。然而, 訓練SVM 時需要大量的記憶體來計算且花費很多時間。針對大資料量這類的例子,我們提出了一個以叢集為基礎(Cluster-based)的SVM 利用叢集的概念將待訓練的資料做初步的篩選,挑選出位於每個叢集外圍上面的資料,也就是對於SVM 的切割平面(separating hyperplane)有較大影響的資料,以達到加速訓練的效果,進而減少支撐向量的個數而提升辨識的效率。我們將其應用於語者辨識的問題上,在辨識率幾乎不受影響的情況下,訓練資料減少了約75%且訓練時間減少了約85%。此外,所得到的支撐向量(support vector)總數也減少了4 倍左右的量,使得辨識的效率大幅的提升。此外,我們將SVM 分類功能實際應用於電視新聞內容上的氣象播報偵測,也達到了很好的結果。
Based on Statistical learning theory, Support Vector Machine(SVM) is a powerful tool for various classification problems, such as pattern recognition and speaker identification etc. However, training SVM consumes large memory and long computing time. This paper proposes a cluster-based learning methodology to reduce training time and the memory size for SVM. By using k-means based clustering technique, training data at boundary of each cluster were selected for SVM learning. We applied this technique to text-independent speaker identification problems. Without deteriorating recognition performance, the training data and time can be reduced up to 75% and 85% respectively. Furthermore, The amount of support vectors of SVM models are the quarter of full-SVM such that the recognition action is more effective. Finally, we apply our proposed method to the case of detecting the weather-forecasting segments in a news program.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009117584
http://hdl.handle.net/11536/50236
Appears in Collections:Thesis


Files in This Item:

  1. 758401.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.