標題: | 一個關於一般音訊資料之音訊分類,音訊分段及音訊檢索之研究 A Study On Classification, Segmentation And Retrieval For Generic Audio Data |
作者: | 林瑞祥 Ruei-Shiang Lin 陳玲慧 Dr. Ling-Hwei Chen 資訊科學與工程研究所 |
關鍵字: | 音訊分類;音訊分段;音訊檢索;audio classification;audio segmentation;audio retrieval |
公開日期: | 2004 |
摘要: | 近年來由於多媒體資料之大量增長,使得有效管理多媒體資料庫之議題變得十分重要而富挑戰性。因此多媒體資料庫之檢索及儲存便成為一個重要之研究領域。由於音訊資料在多媒體資料當中隨處可見,也扮演著一個重要的特徵,因此音訊資料相關的研究與分析便顯得重要;尤其是基於音訊內涵為主的相關分析更為顯的重要與迫切。
基於音訊內涵為主的相關研究其目前的發展狀況仍是十分有限,目前主要的問題與發展方向主要可歸納為三個方向:音訊分類、音訊分段以及音訊檢索。本論文之主要目的在基於spectrogram並運用圖樣識別等相關的理論來發展一些解決上述問題的方法。
一般而言,對於音訊資料的內容分析而言,音訊分類是最為重要的處理步驟;而目前音訊分類的研究其主要的問題乃是音訊的分類種類不足。大多數的分類法都是只將音訊分成語音和音樂兩大類;這樣的分類法比較簡單且容易,然而這樣的分類法並不足以應付目前的多媒體資料。為了解決這個問題,我們將提出一個新的音訊分類法;除了語音和音樂這兩大類,我們所提出的分類法尚考慮了目前多媒體資料中常見的語音與背景音樂混合、流行歌曲等複合型態音訊資料。這個方法主要的重點在於,利用所提出的新音訊特徵與階層式分類法來達到音訊分類的目的。其系統之設計除了具備以音訊內涵為特徵來處理之功能及特色之外,其處理效率更是一個核心重點。
接著我們會提出一個基於音訊分類的音訊分段法。此方法的主要觀念是基於一個事實,即不同種類的音訊資料其spectrogram上蘊含了視覺上可見的特徵;例如音樂性的資料其能量在spectrogram上會集中分佈在某些方向,而語音類的資料,其能量的分佈會集中在某些頻帶區間,而隨機性的音訊資料例如雜訊,其能量的分佈則出現在所有方向。基於上述事實,我們利用Gabor Wavelet先針對以一秒為單位之音訊資料的spectrogram上能量在方向性分佈以及比例進行強化,接著利用強化後的spectrogram上能量在方向性分佈以及比例的分析來進一步將音訊資料分類。接著,基於分類後的結果來應用於音訊片段的音訊分段切割處理。
最後,我們將提出一個基於音訊內涵的音訊資料檢索方法。此方法將針對使用者所提供的音訊查詢片段進行音訊檢索,其檢索能力範圍包括資料庫中相似的音訊片段,樂曲中重複的音訊片段及旋律相同但表達方式不同的樂曲,例如不同語言或者不同人等。此方法的主要觀念也是運用音訊資料其spectrogram上所蘊含的視覺上可見的有效特徵,並利用Gabor Wavelets針對音訊資料的spectrogram上能量在方向性分佈以及比例進行強化,並利用強化後的spectrogram其傅立葉頻譜的反應值來找出最有效率的spectrogram。最後利用特徵選擇以及圖樣識別理論找出所需要的特徵以提供音訊檢索之用。
本論文中所提出之方法可應用於多媒體資料檢索,音訊瀏覽及數位圖書館系統之設計。 The recent emerging of multimedia and the tremendous growth of multimedia data archives have made the effective management of multimedia databases become a very important and challenging task. Digital audio is an important and integral part of many multimedia applications such as the construction of digital libraries. Thus, the demand for an efficient method to automatically analyze audio signal based on its content become urgent. The major problems of automatic audio content analysis include audio classification, segmentation and retrieval etc. In this dissertation, based on spectrogram, we will propose three methods to address the problems of audio classification, segmentation and content-based retrieval. Besides the general audio types such as music and speech tested in existing work, we have taken hybrid-type sounds (speech with music background, speech with environmental noise background, and song) into account. These categories are the basic sets needed in the content analysis of audiovisual data. First, a hierarchical audio classification method will be presented to classify audio signals into the aforementioned basic audio types. Although the proposed scheme covers a wide range of audio types, the complexity is low due to the easy computing of audio features, and this makes online processing possible. The experimental results of the proposed method are quite encouraging. Next, based on the Gabor wavelet features, we will propose a non-hierarchical audio classification and segmentation method. The proposed method will first divide an audio stream into clips, each of which contains one-second audio information. Then, each clip is classified as one of two classes or five classes. Two classes contain speech and music; pure speech, pure music, song, speech with music background, and speech with environmental noise background are for five classes. Finally, a merge technique is provided to achieve segmentation. The experimental results demonstrate the effectiveness of the method. Finally, we will propose a method for content-based retrieval of perceptually similar music pieces in audio documents. It allows the user to select a reference passage within an audio file and retrieve perceptually similar passages such as repeating phrases within a music piece, similar music clips in a database or one song sung by different persons or in different languages. The experimental results demonstrate the effectiveness of the method. The methods proposed in this dissertation can be used as the basic components when developing an audio content analysis system or a system used in a digital library application. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT008723818 http://hdl.handle.net/11536/48001 |
Appears in Collections: | Thesis |
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.