標題: 以雙核心平台實現即時影音追蹤與語音純化系統
Implement a real-time Human Face/Sound Source Tracking and Speech Purification System on a Dual-Core platform
作者: 黃啟揚
胡竹生
電控工程研究所
關鍵字: 雙核心平台;影音追蹤;語音純化;dual-core platform;human face tracking;sound source tracking;speech purification
公開日期: 2007
摘要: 本論文提出一套以嵌入式雙核心平台實現之人臉追蹤、聲源方位估測與語音純化系統。人臉追蹤系統可針對人臉特徵做持續且即時性的追蹤,聲源方位估測系統則可找出發聲者所在方位,而語音純化系統可強化使用者方位語音、抑制其他方位噪音,優化語音品質。本系統在硬體上選用TI推出的嵌入式雙核心系統DM6446 EVM為發展平台,平台DSP核心負責演算法運算,而ARM核心主要負責系統周邊控制。影像資訊透過PTZ攝影機擷取,而聲音擷取則使用實驗室開發的數位式麥克風陣列訊號擷取系統以擷取多通道聲音資訊。軟體上整合了影、音相關演算法,包括語音活動偵測演算法(VAD, Voice Activity Detection)、聲源方位估測演算法(MUSIC, Multiple Signals Classification Method)、適應性語音純化演算法(Adaptive Beamformer)與平均位移演算法(Mean-Shift),希望藉此建置兼具聽覺與視覺之人機互動介面,具有視訊會議系統、居家保全系統和機器人..等相關應用面。
The thesis describes an implementation of human face tracking, sound source direction estimation and speech purification on a dual-core platform. The system can perform real-time tracking of human face, estimating the sound source direction and enhance the speech in that direction while depress the noise in other directions. The development platform is TI DM6446 EVM which is an embedded dual-core system. DSP core is responsible for algorithm realization. ARM core is responsible to control the system peripherals. The image is captured by a PTZ camera and the sound data is acquired by digital microphone array signal acquisition system to get multi-channels sound data. The system software integrates Voice Activity Detection Algorithm (VAD), Multiple Signals Classification Method (MUSIC), Adaptive Beamformer and Mean-Shift Object Tracking Algorithm. Using the technique, we can build a human-robot interface with vision and hearing. This system can apply to video conference, home guarding and robot etc.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009512544
http://hdl.handle.net/11536/38250
Appears in Collections:Thesis


Files in This Item:

  1. 254401.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.