利用二維影像分析及聲音同步技術作演講者臉部動畫之建構

標題:	利用二維影像分析及聲音同步技術作演講者臉部動畫之建構 A Study on Virtual Talking Head Animation by 2D Image Analysis and Voice Synchronization Techniques
作者:	林郁政蔡文祥資訊科學與工程研究所
關鍵字:	語音素;音節;phoneme;syllable
公開日期:	2001
摘要:	近年來，虛擬演講者之動畫在許多電腦介面的應用上逐漸扮演著重要的角色，在本研究中，我們提出了二維影像分析及聲音同步技術等方法來實現虛擬演講者臉部動作。相對於傳統上利用複雜的三維模型去建構虛擬演講者，我們使用簡單的二維影像序列來完成動畫。此系統主要分成二部分，第一是學習部分，第二是動畫部分。在學習部分中，我們使用動作捕捉系統來擷取臉部表情動作及聲音，之後每張說話的圖片都被切割出其嘴型及表情。在製作動畫時，我們使用alpha混色技術來合併每張被切割出的圖片。為了減少存放每一嘴型之資料庫的容量大小，我們把四百一十一個中文基本音化簡成一百二十個嘴型。為了在動畫中加入表情，我們使用伽傌分配和均勻分配來控制每個眨眼和挑眉間的時間差。最後，我們使用語音分析程式來獲得語音輸入中每個音節的發音長度，並根據此資訊使輸入的語音和撥放的動畫達到同步的目的。實驗結果證明了以上的方法確實可行。 In recent years, animated talking heads are playing an increasingly important role in many applications of computer interfacing. An approach to virtual talking head animation by 2D image analysis and voice synchronization techniques is proposed in this study. Instead of using the conventional way of adopting complicated 3D models to construct a virtual talking head, we use 2D image sequences to simplify the animation process. The proposed animation method includes two phases. One is the learning phase, and the other is the animation phase. In the learning phase, a motion capture system is used to capture speaking face images with facial expressions and sound. Then each speaking face image is segmented into a base face and some facial parts. And the alpha-blending technique is employed to smooth the seam between the base face and the facial parts when they are integrated to form new face images with expressions for use in animation. To reduce the size of the viseme database, a method for classification of the 411 base-syllables in Mandarin into 120 categories is proposed. To add facial expressions into the animation, the gamma distribution and the uniform distribution are used to model the timing behaviors of eye blinks and eyebrow movements. Finally, a speech analyzer is used to obtain the timing interval of each word. According to this timing information, a method is proposed to synchronize the speech and the talking head animation. Experimental results show the feasibility and practicability of the proposed methods.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#NT900394068 http://hdl.handle.net/11536/68595
顯示於類別：	畢業論文