标题: | 一个关于一般音讯资料之音讯分类,音讯分段及音讯检索之研究 A Study On Classification, Segmentation And Retrieval For Generic Audio Data |
作者: | 林瑞祥 Ruei-Shiang Lin 陈玲慧 Dr. Ling-Hwei Chen 资讯科学与工程研究所 |
关键字: | 音讯分类;音讯分段;音讯检索;audio classification;audio segmentation;audio retrieval |
公开日期: | 2004 |
摘要: | 近年来由于多媒体资料之大量增长,使得有效管理多媒体资料库之议题变得十分重要而富挑战性。因此多媒体资料库之检索及储存便成为一个重要之研究领域。由于音讯资料在多媒体资料当中随处可见,也扮演着一个重要的特征,因此音讯资料相关的研究与分析便显得重要;尤其是基于音讯内涵为主的相关分析更为显的重要与迫切。 基于音讯内涵为主的相关研究其目前的发展状况仍是十分有限,目前主要的问题与发展方向主要可归纳为三个方向:音讯分类、音讯分段以及音讯检索。本论文之主要目的在基于spectrogram并运用图样识别等相关的理论来发展一些解决上述问题的方法。 一般而言,对于音讯资料的内容分析而言,音讯分类是最为重要的处理步骤;而目前音讯分类的研究其主要的问题乃是音讯的分类种类不足。大多数的分类法都是只将音讯分成语音和音乐两大类;这样的分类法比较简单且容易,然而这样的分类法并不足以应付目前的多媒体资料。为了解决这个问题,我们将提出一个新的音讯分类法;除了语音和音乐这两大类,我们所提出的分类法尚考虑了目前多媒体资料中常见的语音与背景音乐混合、流行歌曲等复合型态音讯资料。这个方法主要的重点在于,利用所提出的新音讯特征与阶层式分类法来达到音讯分类的目的。其系统之设计除了具备以音讯内涵为特征来处理之功能及特色之外,其处理效率更是一个核心重点。 接着我们会提出一个基于音讯分类的音讯分段法。此方法的主要观念是基于一个事实,即不同种类的音讯资料其spectrogram上蕴含了视觉上可见的特征;例如音乐性的资料其能量在spectrogram上会集中分布在某些方向,而语音类的资料,其能量的分布会集中在某些频带区间,而随机性的音讯资料例如杂讯,其能量的分布则出现在所有方向。基于上述事实,我们利用Gabor Wavelet先针对以一秒为单位之音讯资料的spectrogram上能量在方向性分布以及比例进行强化,接着利用强化后的spectrogram上能量在方向性分布以及比例的分析来进一步将音讯资料分类。接着,基于分类后的结果来应用于音讯片段的音讯分段切割处理。 最后,我们将提出一个基于音讯内涵的音讯资料检索方法。此方法将针对使用者所提供的音讯查询片段进行音讯检索,其检索能力范围包括资料库中相似的音讯片段,乐曲中重复的音讯片段及旋律相同但表达方式不同的乐曲,例如不同语言或者不同人等。此方法的主要观念也是运用音讯资料其spectrogram上所蕴含的视觉上可见的有效特征,并利用Gabor Wavelets针对音讯资料的spectrogram上能量在方向性分布以及比例进行强化,并利用强化后的spectrogram其傅立叶频谱的反应值来找出最有效率的spectrogram。最后利用特征选择以及图样识别理论找出所需要的特征以提供音讯检索之用。 本论文中所提出之方法可应用于多媒体资料检索,音讯浏览及数位图书馆系统之设计。 The recent emerging of multimedia and the tremendous growth of multimedia data archives have made the effective management of multimedia databases become a very important and challenging task. Digital audio is an important and integral part of many multimedia applications such as the construction of digital libraries. Thus, the demand for an efficient method to automatically analyze audio signal based on its content become urgent. The major problems of automatic audio content analysis include audio classification, segmentation and retrieval etc. In this dissertation, based on spectrogram, we will propose three methods to address the problems of audio classification, segmentation and content-based retrieval. Besides the general audio types such as music and speech tested in existing work, we have taken hybrid-type sounds (speech with music background, speech with environmental noise background, and song) into account. These categories are the basic sets needed in the content analysis of audiovisual data. First, a hierarchical audio classification method will be presented to classify audio signals into the aforementioned basic audio types. Although the proposed scheme covers a wide range of audio types, the complexity is low due to the easy computing of audio features, and this makes online processing possible. The experimental results of the proposed method are quite encouraging. Next, based on the Gabor wavelet features, we will propose a non-hierarchical audio classification and segmentation method. The proposed method will first divide an audio stream into clips, each of which contains one-second audio information. Then, each clip is classified as one of two classes or five classes. Two classes contain speech and music; pure speech, pure music, song, speech with music background, and speech with environmental noise background are for five classes. Finally, a merge technique is provided to achieve segmentation. The experimental results demonstrate the effectiveness of the method. Finally, we will propose a method for content-based retrieval of perceptually similar music pieces in audio documents. It allows the user to select a reference passage within an audio file and retrieve perceptually similar passages such as repeating phrases within a music piece, similar music clips in a database or one song sung by different persons or in different languages. The experimental results demonstrate the effectiveness of the method. The methods proposed in this dissertation can be used as the basic components when developing an audio content analysis system or a system used in a digital library application. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT008723818 http://hdl.handle.net/11536/48001 |
显示于类别: | Thesis |
文件中的档案:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.