標題: | 以語者辨識為基礎的主播認定之研究 The Study of Anchor Person Identification Using Audio Information |
作者: | 富博超 Raymond B.C. Fu 傅心家 Hsin-Chia Fu 資訊科學與工程研究所 |
關鍵字: | 新聞檢索系統;語者辨識;高氏混合模組;News Browsing System;Speaker Identification;Gaussian Mixture Model |
公開日期: | 2000 |
摘要: | 圖像與聲音資料可提供的訊息遠超過單純的文字所能表現的,但是,在現行的環境中,這些資料的處理複雜度卻遠超過目前之技術所能運用,因此世界各地許多的科學家都針對這個問題正在作深入的研究,期待能發展出更先進的技術以處理圖像及聲音,並讓我們能夠從中更有效率的獲取有意義的資訊,因此,在這篇論文中,我將要提出一套針對現行新聞節目之聲音資訊的處理方法,藉由聲音上的特性有效的分類新聞節目,進而將新聞節目正確的分段,以便未來更進一步的分析新聞片段,提供使用者全自動的新聞分類與擷取功能,以使得我們能更快速的索取希望的片段,得到想要的資訊,而我所提出的這個方法是利用現行的語者辨識技術,取Mel 超頻率係數為各模組訓練的來源,並以高斯混合模組作為各語者辨識的基礎,以此找出主播出現的畫面。 The information that graphs and audio can present is a lot richerthan what text can do. However, in current technology, the complexity and difficulty of processing graphs and audio sources still exceed our ability to handle. Thus, more and more scientists around the world are working on this area, trying to develop more advance technology to process graphs and audio sources and afford people benefits on it. In this Thesis, a method is proposed which handles audio information on news video programs.The goal is to correctly separate news and anchor scenes, provide users automatic news classification and retrieval ability. The method proposed use recent Text - Independent Speaker Identification technology, make Cepstral Coefficients training features, using Gaussian Mixture Model as the classifier to identify speakers. A software system is also developed and implemented. Experimental results show that this method can achieve better performance than some previous developed systems. {List of Figures}{v} {Abstract}{i} {Acknowledgements}{ii} {1}Introduction}{1} {2}Fundamentals of Speaker Identification and Gaussian Mixture Model}{3} {2.1}The Speech Production Process}{3} {2.2}Signal Processing and Cepstral Analysis}{5} {2.2.1}Definitions and General Concepts}{6} {2.2.2}Liftering Operation}{10} {2.3}The Gaussian Mixture Speaker Model}{11} {2.4}K-means clustering}{13} {2.5}Speaker Identification}{15} {3}Implementation}{16} {3.1}The Training of Gaussian Mixture Speaker Models}{16} {3.2}Anchor Person Identification Testing Process}{18} {4}Experimental Results and Evaluation}{23} {4.1}Experimental Results on Anchor Identification}{23} {4.2}As an Embedded Application in the News Video Browsing System}{24} {5}Conclusion}{29} {Bibliography}{30} |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT890392081 http://hdl.handle.net/11536/66872 |
Appears in Collections: | Thesis |