标题: | 可辨识运动姿态的时空延迟网路及其在唇语辨识之应用 A Space-Time Delay Neural Network for Motion Recognition and Its Application to Lipreading in Bimodal Speech Recognition |
作者: | 林文杰 Lin, Wen-Chieh 林进灯 Chin-Teng Lin 电控工程研究所 |
关键字: | 时空延迟网路;唇语辨识;类神经网路;语音辨识;电脑视觉;运动辨识;STDNN;Lipreading;Neural Network;Speech Recognition;Computer Vision;Motion Recognition |
公开日期: | 1995 |
摘要: | 近年来,由于在无人监视系统,多重模型人机介面,及交通控制系统等不 同领域中对于电脑视觉的需求增加,物体运动姿态的辨识问题也逐渐受到 重视。现存的方法中,大多将待辨识的连续影像序列,经平面影像的特征 抽取方法后,转换成特征向量序列,再送入辨识器辨识。此类方法的最大 缺点在于,辨识物体运动姿态的有效资讯被局限在空间维度或时间维度。 然而,我们相信描述物体运动的资讯应存在于时空中,而非仅局限于时间 维度或空间维度中。因此,我们提出一个时空延迟类神经网路来处理运动 姿态辨识的问题。这个新的类神经网路能处理关于三维动态资讯的问题, 得运动姿态的辨识能在时空维度进行,避免了前述的问题。此外,这个类 神经网路对于物体运动姿态在时间维度或空间维度产生轻微偏移失真时, 仍能有效辨识。这使得前级的影像追踪系统的负担减轻,因为物体的定位 在不是非常准确的情况下,这个类神经网路仍能有效处理。 我们将这个网路应用在唇语辨识上,实验结果显示这个网路比传统的时间 延迟网路构成的辨识系统有较佳的学习能力与辨识能力。 The researches of the motion recognition has received more and more attentions in recent years because the need for computer vision is increasing in many domains, such as the surveillance system, multimodal human computer interface, and traffic control system. Most of the existing approaches separate the recognition into the spatial feature extraction and time domai□□cognition. However, we believe that the information of motion resides in the space-time domain, not restricted to the time domain or space domain only. Consequently, it seems more reasonable to integrate the feature extraction and classification in the space and time domains altogether. We propose a Space-Time Delay Neural Network (STDNN) that can deal with the 3-D dynamic information, such as motion recognition. For the motion recognition problem that we focus in this paper, the STDNN is an unified structure, in which the low-level spatiotemporal feature extraction and space-time recognition are embedded. It possesses the spatiotemporal shift-invariant recognition abilities that are inherited from the time delay neural network (TDNN) and space displacement neural network (SDNN). Unlike the multilayer perceptron (MLP), TDNN, and SDNN, the STDNN is constructed by the vector-type nodes and matrix-type links such that the spatiotemporal information can be gracefully represented in a neural network. Some experiments are done to evaluate the performance of the proposed STDNN. In the moving Arabic numerals (MAN) experiments, which simulate the object'smoving in the space-time domain by image sequences, the STDNN shows its generalization ability on spatiotemporal shift-invariance recognition. In the lipreading experiment, the STDNN recognizes the lip motions by the inputs of real image sequences. It shows that the STDNN has better performance than the existing TDNN- based system, especially on the generalization ability. Although the lipreading is a more specific application, the STDNN can be applied to other applications since no domain-dependentknowledge is used in the experiment. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT840327072 http://hdl.handle.net/11536/60332 |
显示于类别: | Thesis |