標題: 使用時頻變化調變之稀疏編碼於自動音樂曲風辨識
Sparse Coding based Music Genre Classification using Spectro-Temporal Modulations
作者: 林至善
Lin, Chih-Shan
冀泰石
工學院聲音與音樂創意科技碩士學位學程
關鍵字: 音樂;曲風;辨識;分類;稀疏編碼;字典學習;時間資訊整合;音樂資訊檢索;music;genre;recognition;classification;sparse coding;dictionary learning;temporal pooling;MIR;music information retrieval
公開日期: 2014
摘要: 音樂,是人類生活中的調味品。近年來,隨著科技的進步,音樂賞析的應用增多,一門稱作音樂資訊檢索 (Music Information Retrieval) 的研究領域應運而生。其中,自動曲風辨識 (Automatic music genre recognition) 則是經典議題之一。在本論文中,我們假設特定樂器在特定演奏方式下所產生的聲音會在頻譜上呈現特定的樣式,並將音樂頻譜視為不同頻譜樣式的組合,且認為組合情況會是辨識曲風的有力依據。我們分別用短時傅立葉轉換頻譜以及時頻調變參數描述頻譜樣式,接著使用字典學習 (dictionary learning) 以及稀疏編碼 (sparse coding)來呈現各種頻譜樣式在音樂頻譜中的組合情況,再將編碼當作特徵參數用以訓練曲風分類器。此外,我們亦將聽覺頻譜、常數 -Q 轉換頻譜及其時頻調變參數用於上述實驗,並比較各系統在曲風辨識上的優劣。實驗結果顯示,基於常數-Q轉換及其時頻調變參數的系統有最佳的曲風辨識率。
Music is the spice of human life. In recent years, a research field called Music Information Retrieval (MIR) springs up with advances in technology and needs of listener. Automatic music genre recognition is one of the classical issues in the field. In this thesis, we assume that a specify music instrument with a specific playing style forms a specific spectral pattern on a spectrogram. Then we consider a music spectrogram as the composition of many specify spectral patterns. We believe that the proportion of spectral patterns can be discriminative among music genre. We use short-time Fourier transform spectrogram and spectral-temporal modulation feature as spectral pattern descriptors. These descriptors are represented as the composition of many specify spectral patterns through dictionary learning and sparse coding and used for classifier training. In addition, auditory spectrogram, constant-Q transform spectrogram and corresponding spectral-temporal modulation feature are also used in the experiments. The result shows that systems based on constant-Q transform-based modulation feature performs better than conventional one which usually based on short-time Fourier transform spectrogram.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070151902
http://hdl.handle.net/11536/125609
顯示於類別:畢業論文