標題: | 利用模糊類神經網路之音頻信號分類與切割技術 Audio Classification and Segmentation Technique Using Fuzzy Neural Networks |
作者: | 陳瑞正 Jui-Cheng Chen 林進燈 Chin-Teng Lin 電控工程研究所 |
關鍵字: | 音頻信號分類;過零率;特徵抽取;類神經網路;音頻信號分析;audio signal classification;zero-crossing rate;feature extraction;neural networks;audio signal analysis |
公開日期: | 2004 |
摘要: | 在本論文中我們提出了一個針對音頻信號之分類與切割的系統,此系統可將含有靜音、純語音、純音樂以及歌曲之檔案,根據其類型加以分類與切割。我們針對上述各種音訊的特徵的作分析與比較,並根據這些分析與比較的結果,設計一套分類流程將輸入的音訊分兩階段依序完成分類與切割。一開始的靜音偵測根據一個門檻值標示出音訊中屬於靜音的部分。之後,第一階段將輸入音訊中非靜音部分分為純語音與「含有音樂成分」兩類,第二階段將在第一階段中被歸類為「含有音樂成分」的部分,進一步分為純音樂以及歌曲。為了解決傳統特徵在進行純音樂與歌曲分類時分類效果不佳的問題,本論文提出了一個名為「前三峰值之頻率變化量(FVTP)」的新特徵。此特徵描述了歌曲的頻譜結構會隨著時間而顯著地改變而純音樂之頻譜結構改變量相對較小之特性。因此該特徵能在進行純音樂與歌曲分類時,改善分類效果不佳的問題。而在分類器的選用方面,本系統採用一前向式自我建構類神經模糊推理網路(SONFIN)做為核心分類器。該網路具有可自我建構並調整的架構與參數學習的功能,以及優異的模糊類神經推論過程。我們利用這些特性達到較佳之分類結果。實驗結果顯示,本系統可達到平均90%以上的分類正確率。因此,本系統可作為許多如語音辨識、語者辨識等應用系統的前端處理,使輸入這些應用系統的內容符合系統要求以提升應用系統的效能。 In this thesis, we proposed an audio classification and segmentation system. The system is used to classify and segment audio files which contain silence, pure speech, pure music, and song according to their contents. We analyzed and compared features of audio signals and designed a two-stage classification flow to classify and segment input audio signals sequentially. The flow starts with the silence detection which indexes silence according to a threshold. Then, stage 1 classifies the nonsilence parts into pure speech and “with music components”. Stage 2 classifies the “with music components” parts in stage 1 into pure music and song. In order to solve the problem that traditional features do not work well when it comes to pure music/song classification, we proposed a novel feature named FVTP. The feature describes the property that variations of the spectrum structure are larger for song but smaller for pure music. Thus, the feature can improve the performance of pure music/song classification. On the other hand, an on-line self-constructing neural fuzzy inference network (SONFIN) was adopted as the main classifier in this system. The SONFIN finds its optimal structure as well as parameters automatically and it has a superior inference process. We achieved a better classification result by utilizing these properties. Experimental results showed that an accuracy rate of more than 90% was achieved. Thus, the proposed system is capable of being a front-end for many application systems such as speech recognition and speaker identification to improve the performance of these application systems. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT009212550 http://hdl.handle.net/11536/68457 |
顯示於類別: | 畢業論文 |