| 標題: | A Two-stage Singing Voice Separation Algorithm Using Spectro-temporal Modulation Features |
| 作者: | Yen, Frederick Z. Huang, Mao-Chang Chi, Tai-Shih 電機學院 College of Electrical and Computer Engineering |
| 關鍵字: | singing voice separation;spectro-temporal modulation;auditory scene analysis |
| 公開日期: | 2015 |
| 摘要: | A two-stage singing voice separation algorithm using spectrotemporal modulation features is proposed in this paper. First, music clips are transformed into auditory spectrograms and the spectral-temporal modulation contents of all time-frequency (T-F) units of the auditory spectrograms are extracted using an auditory model. Then, T-F units are sequentially clustered using the expectation-maximization (EM) algorithm into percussive, harmonic and vocal units through the proposed two-stage algorithm. Lastly, the singing voice is synthesized from clustered vocal T-F units via time-frequency masking. The algorithm was evaluated using the MIR-1K dataset and demonstrated better separation results than our previously proposed one-stage algorithm. |
| URI: | http://hdl.handle.net/11536/136225 |
| ISBN: | 978-1-5108-1790-6 |
| 期刊: | 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 |
| 起始頁: | 3321 |
| 結束頁: | 3324 |
| Appears in Collections: | Conferences Paper |

