标题: | 以语者辨识为基础的主播认定之研究 The Study of Anchor Person Identification Using Audio Information |
作者: | 富博超 Raymond B.C. Fu 傅心家 Hsin-Chia Fu 资讯科学与工程研究所 |
关键字: | 新闻检索系统;语者辨识;高氏混合模组;News Browsing System;Speaker Identification;Gaussian Mixture Model |
公开日期: | 2000 |
摘要: | 图像与声音资料可提供的讯息远超过单纯的文字所能表现的,但是,在现行的环境中,这些资料的处理复杂度却远超过目前之技术所能运用,因此世界各地许多的科学家都针对这个问题正在作深入的研究,期待能发展出更先进的技术以处理图像及声音,并让我们能够从中更有效率的获取有意义的资讯,因此,在这篇论文中,我将要提出一套针对现行新闻节目之声音资讯的处理方法,藉由声音上的特性有效的分类新闻节目,进而将新闻节目正确的分段,以便未来更进一步的分析新闻片段,提供使用者全自动的新闻分类与撷取功能,以使得我们能更快速的索取希望的片段,得到想要的资讯,而我所提出的这个方法是利用现行的语者辨识技术,取Mel 超频率系数为各模组训练的来源,并以高斯混合模组作为各语者辨识的基础,以此找出主播出现的画面。 The information that graphs and audio can present is a lot richerthan what text can do. However, in current technology, the complexity and difficulty of processing graphs and audio sources still exceed our ability to handle. Thus, more and more scientists around the world are working on this area, trying to develop more advance technology to process graphs and audio sources and afford people benefits on it. In this Thesis, a method is proposed which handles audio information on news video programs.The goal is to correctly separate news and anchor scenes, provide users automatic news classification and retrieval ability. The method proposed use recent Text - Independent Speaker Identification technology, make Cepstral Coefficients training features, using Gaussian Mixture Model as the classifier to identify speakers. A software system is also developed and implemented. Experimental results show that this method can achieve better performance than some previous developed systems. {List of Figures}{v} {Abstract}{i} {Acknowledgements}{ii} {1}Introduction}{1} {2}Fundamentals of Speaker Identification and Gaussian Mixture Model}{3} {2.1}The Speech Production Process}{3} {2.2}Signal Processing and Cepstral Analysis}{5} {2.2.1}Definitions and General Concepts}{6} {2.2.2}Liftering Operation}{10} {2.3}The Gaussian Mixture Speaker Model}{11} {2.4}K-means clustering}{13} {2.5}Speaker Identification}{15} {3}Implementation}{16} {3.1}The Training of Gaussian Mixture Speaker Models}{16} {3.2}Anchor Person Identification Testing Process}{18} {4}Experimental Results and Evaluation}{23} {4.1}Experimental Results on Anchor Identification}{23} {4.2}As an Embedded Application in the News Video Browsing System}{24} {5}Conclusion}{29} {Bibliography}{30} |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT890392081 http://hdl.handle.net/11536/66872 |
显示于类别: | Thesis |