标题: 自动化建构虚拟说话人脸与其相关应用之研究
A Study on Automatic Construction of Virtual Talking Faces and Applications
作者: 赖成骏
Lai, Cheng-Jyun
蔡文祥
Tsai, Wen-Hsiang
资讯科学与工程研究所
关键字: 说话人脸;虚拟人脸;脸部辨识;人脸动画;语音辨识;唇形同步;自动化学习;特征值抽取;以样本为基础的影像合成;talking-heads;virtual face;face recognition;facial animation;speech recognition;lip synchronization;automatic learning;feature extraction;sample-based image synthesis
公开日期: 2003
摘要: 本论文提出了一套自动化建构虚拟说话人脸的系统。这个系统以二维脸部影像为基础,包含了三个阶段:录影学习、特征值学习与动画制作。在录影学习阶段,我们提出了一个包含所有种类的中文注音的稿子,模特儿只要念上面的句子就可以完成学习,而不用单独念每个音。在特征值学习阶段,语音特征、脸部特征与背景影像序列等资讯系统都会自动学习,并以自动断句来辅助学习语音特征。另本系统亦能产生自然摇头效果的背景影像序列,基于影像比对方法学习脸部特征的位置。在达到次像素精准度的同时,这个方法也可以适用在摇动的人脸上。在动画制作阶段,我们提出了几个方法来增进动画的精细度。首先提出了一个达成语音与影像同步的方法。为了建立更流畅的动画,我们分析了相连音间所转折画格数目,也提出了一个自动决定嘴巴影像与背景影像最佳整合方式的方法。为了建立更真实的虚拟人脸,我们研究并模拟了真人说话和唱歌时的行为。最后我们实作出三种有趣的应用。良好的实验结果证实本论文所提出方法之可行性。
In this study, a system for automatic creation of virtual talking faces is proposed. The system is based on the use of 2D facial images and includes three processes: video recording, feature learning, and animation generation. In the video recording process, a transcript containing all classes of Mandarin syllables is proposed, so that a model can read sentences on it instead of reading all the syllables separately. In the feature learning process, audio features, facial features, and base image sequences are all learned automatically. A sentence segmentation algorithm is proposed to help the learning of syllables. Base image sequences that can exhibit natural head shaking actions are generated. An image matching method is proposed to learn the positions of facial features in a face image with sub-pixel precision. The method also can be applied to shaking faces. In the animation generation process, several methods are proposed to improve the quality of animations. A method is proposed to synchronize a speech and image frames. To create smoother animations, the number of proper transition frames between successive visemes is analyzed. Also proposed is method to find the best way for integration of a mouth image and a base image. To create more natural virtual faces, a method is proposed to simulate the behaviors of real talking persons and singing persons. Three kinds of interesting applications are implemented. Good experimental results show the feasibility of the proposed methods.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009123522
http://hdl.handle.net/11536/52757
显示于类别:Thesis


文件中的档案:

  1. 352201.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.