標題: 融合語音與影像之語音活動偵測的數位助聽器系統
Digital Hearing Aid System with Audio-Vision Fused VAD
作者: 鄭承翰
周世傑
Zheng, Chen-Han
Jou, Shyh-Jye
電子研究所
關鍵字: 數位助聽器系統;語音活動偵測;Digital Hearing Aid;VAD;Audio-Visual Fused
公開日期: 2016
摘要: 助聽器配戴者經常需要在不同嘈雜的環境下使用,但因為嘈雜而導致使用者對語音理解度下降,特別是環境有講話的雜訊。語音活動偵測是雜訊壓抑的關鍵因素,在低雜訊比或環境雜訊是人聲時,如果僅依靠聲音特徵去判斷是否為語音,其準確性會大幅下降,因此已畢業的學長利用嘴唇的影像特徵來幫助判斷是否為語音,並有期望的成效。但臉部特徵的演算法是加密過的並且為了達到高準確性而有諸多的限制,因此必須去重新實現或找到替代的演算法,以達到高強度且方便於延伸應用的目標。 這篇論文利用嘴唇的影像特徵和語音特徵一起經由支持向量機器學習訓練,加上Keeper的調整,得出最後的語音判斷。正常來說,影像處理的運算複雜度都蠻大的,因此影像輔助的功能在低雜訊比或人聲雜訊下才能體現其價值。另外,我們整合了探測訊號之回授消除和音高式噪音消除系統,並將原先在Matlab上無法達到即時運算的效果用C語言完成,我們利用桌上型電腦在Visual Studio上以C++語言重新建構助聽器系統,達到即時運算並回傳處理結果給使用者。
Hearing loss users usually need to use hearing aid device in different noisy environment. The recognition about speech will be decreased due to noise, especially the speech-like noise. Voice activity detector (VAD) is a key factor in noise reduction. The VAD accuracy determined by audio features will decrease when the user is in low SNR or speech-like noise environment. Our previous work had utilized features of lips to assist VAD and achieved good results. Unfortunately, the previous algorithm of facial feature extraction was encrypted and had many limitation to have high accuracy. Therefore, it needs to be refined to find an alternative algorithm for more robust and flexible to do application. This thesis utilizes both lips and audio features to Support Machine Model (SVM) which is already trained. The results of SVM classifier then be judged through Keeper to smooth the results to have the final VAD. Generally, the computational complexity of image processing is high. Thus, the Audio-Visual Fused VAD (AV-VAD) embody the capacity when environment is low SNR or speech-like noise. Besides, we also have integrated the AV-VAD into the feedback cancellation with probe signal and pitch based noise reduction. In the previous work, the HA system cannot achieve real-time on Matlab. To overcome the problem, we use a PC to refine the HA system by using C++ language on Visual Studio. Moreover, we hope it can achieve real-time and return the processed sound to the user simultaneously.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070350276
http://hdl.handle.net/11536/139945
顯示於類別:畢業論文