標題: | 使用對齊單音MIDI改善流行歌曲旋律擷取 Aligning Popular Music with Mono MIDI for Singing Pitch Extraction |
作者: | 蔡昌祐 Cai, Chang-You 陳信宏 Chen, Sin-Horng 電信工程研究所 |
關鍵字: | 旋律擷取;諧波;哼唱查詢;Pitch Extraction;harmonic;QBSH |
公開日期: | 2012 |
摘要: | 音高代表聲音的基本頻率,在Query-by-Singing/Humming (QBSH )系統上是一個重要的特徵,利用此特徵值的比對找出最相似的歌曲為QBSH系統的主要方法,因此音高偵測的準確度變得相當重要。而流行歌曲中的歌唱音高可以用人耳辨識出來,但因為背景伴奏、諧波等干擾的緣故,要利用電腦算出流行歌曲的歌唱音高將會困難許多。
本論文首先參考前人所提出的音高擷取方法來計算出流行歌曲的音高曲線,此方法會先壓抑樂器伴奏來提高人聲能量,並用疊加諧波後的頻譜做一連串的處理,再使用一種人聲音高頻率範圍偵測的方法來消除諧波,最後使用動態規劃法來擷取出音高曲線。但此方法仍存在一些缺點,且音高曲線並沒有將非人聲段部分去除,因此本論文提出一種改善前人作法的方法,基本構想是使用單音MIDI與要處理的流行歌曲對齊來協助改善音高之偵測,首先計算出各自頻率刻度轉換後的頻譜,並建立相似矩陣做動態時軸校對以找出單音MIDI中每個音符對應到流行歌曲的時間,並使用一套後處理的方法來修正不自然的音符,最後利用對齊好的單音MIDI判斷人聲段與非人聲段,並重新計算更準確的音高曲線。實驗結果顯示,本方法可以有效改進流行歌曲的音高偵測。 Pitch represents the fundamental frequency of voice. It is an important feature in a Query by Singing or Humming (QBSH) system. Currently, using pitch feature to find the most matched song is a popular way in QBSH. The accuracy of pitch detection is hence a critical issue. Although human can recognize singing pitch in a song with music accompaniment, it is not easy for a computer to automatically detect the singing pitch from a song because of the inferences of background music and harmonics. In this thesis, we first use an existing method to extract the melody line of a popular song. The method first depresses the background music to enhance the singing voice. It then uses a method to enhance the pitch signal by summing harmonics. A method to estimate the range of human’s pitch is then applied to eliminate all harmonics. Lastly, it finds the melody line by dynamic programming. Some drawbacks of the method can still be found, including the inaccuracy of pitch tracking at the beginning of singing signal and the existence of melody line at the non-singing part. We hence propose a method to improve it in this study. The method uses the monophonic MIDI signal aligned with the processing song to help to improve the pitch detection. It first computes the MIDI scale spectra of the two signals and sets up a similarity matrix for their alignment. A post-processing is then employed to segment the song and detect unnatural notes. Lastly, it utilizes the aligned MIDI to determine the vocal (singing) segment of the song and recalculates the melody line. Experimental results confirmed the effectiveness of the proposed approach. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT079913578 http://hdl.handle.net/11536/49357 |
顯示於類別: | 畢業論文 |