标题: 可辨识运动姿态的时空延迟网路及其在唇语辨识之应用
A Space-Time Delay Neural Network for Motion Recognition and Its Application to Lipreading in Bimodal Speech Recognition
作者: 林文杰
Lin, Wen-Chieh
林进灯
Chin-Teng Lin
电控工程研究所
关键字: 时空延迟网路;唇语辨识;类神经网路;语音辨识;电脑视觉;运动辨识;STDNN;Lipreading;Neural Network;Speech Recognition;Computer Vision;Motion Recognition
公开日期: 1995
摘要: 近年来,由于在无人监视系统,多重模型人机介面,及交通控制系统等不
同领域中对于电脑视觉的需求增加,物体运动姿态的辨识问题也逐渐受到
重视。现存的方法中,大多将待辨识的连续影像序列,经平面影像的特征
抽取方法后,转换成特征向量序列,再送入辨识器辨识。此类方法的最大
缺点在于,辨识物体运动姿态的有效资讯被局限在空间维度或时间维度。
然而,我们相信描述物体运动的资讯应存在于时空中,而非仅局限于时间
维度或空间维度中。因此,我们提出一个时空延迟类神经网路来处理运动
姿态辨识的问题。这个新的类神经网路能处理关于三维动态资讯的问题,
得运动姿态的辨识能在时空维度进行,避免了前述的问题。此外,这个类
神经网路对于物体运动姿态在时间维度或空间维度产生轻微偏移失真时,
仍能有效辨识。这使得前级的影像追踪系统的负担减轻,因为物体的定位
在不是非常准确的情况下,这个类神经网路仍能有效处理。
我们将这个网路应用在唇语辨识上,实验结果显示这个网路比传统的时间
延迟网路构成的辨识系统有较佳的学习能力与辨识能力。
The researches of the motion recognition has received more and
more attentions in recent years because the need for computer
vision is increasing in many domains, such as the surveillance
system, multimodal human computer interface, and traffic control
system. Most of the existing approaches separate the recognition
into the spatial feature extraction and time domai□□cognition.
However, we believe that the information of motion resides in
the space-time domain, not restricted to the time domain or
space domain only. Consequently, it seems more reasonable to
integrate the feature extraction and classification in the space
and time domains altogether. We propose a Space-Time Delay
Neural Network (STDNN) that can deal with the 3-D dynamic
information, such as motion recognition. For the motion
recognition problem that we focus in this paper, the STDNN is an
unified structure, in which the low-level spatiotemporal feature
extraction and space-time recognition are embedded. It possesses
the spatiotemporal shift-invariant recognition abilities that
are inherited from the time delay neural network (TDNN) and
space displacement neural network (SDNN). Unlike the multilayer
perceptron (MLP), TDNN, and SDNN, the STDNN is constructed by
the vector-type nodes and matrix-type links such that the
spatiotemporal information can be gracefully represented in a
neural network. Some experiments are done to evaluate the
performance of the proposed STDNN. In the moving Arabic numerals
(MAN) experiments, which simulate the object'smoving in the
space-time domain by image sequences, the STDNN shows its
generalization ability on spatiotemporal shift-invariance
recognition. In the lipreading experiment, the STDNN recognizes
the lip motions by the inputs of real image sequences. It shows
that the STDNN has better performance than the existing TDNN-
based system, especially on the generalization ability. Although
the lipreading is a more specific application, the STDNN can be
applied to other applications since no domain-dependentknowledge
is used in the experiment.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT840327072
http://hdl.handle.net/11536/60332
显示于类别:Thesis