標題: 可辨識運動姿態的時空延遲網路及其在唇語辨識之應用
A Space-Time Delay Neural Network for Motion Recognition and Its Application to Lipreading in Bimodal Speech Recognition
作者: 林文杰
Lin, Wen-Chieh
林進燈
Chin-Teng Lin
電控工程研究所
關鍵字: 時空延遲網路;唇語辨識;類神經網路;語音辨識;電腦視覺;運動辨識;STDNN;Lipreading;Neural Network;Speech Recognition;Computer Vision;Motion Recognition
公開日期: 1995
摘要: 近年來,由於在無人監視系統,多重模型人機介面,及交通控制系統等不 同領域中對於電腦視覺的需求增加,物體運動姿態的辨識問題也逐漸受到 重視。現存的方法中,大多將待辨識的連續影像序列,經平面影像的特徵 抽取方法後,轉換成特徵向量序列,再送入辨識器辨識。此類方法的最大 缺點在於,辨識物體運動姿態的有效資訊被侷限在空間維度或時間維度。 然而,我們相信描述物體運動的資訊應存在於時空中,而非僅侷限於時間 維度或空間維度中。因此,我們提出一個時空延遲類神經網路來處理運動 姿態辨識的問題。這個新的類神經網路能處理關於三維動態資訊的問題, 得運動姿態的辨識能在時空維度進行,避免了前述的問題。此外,這個類 神經網路對於物體運動姿態在時間維度或空間維度產生輕微偏移失真時, 仍能有效辨識。這使得前級的影像追蹤系統的負擔減輕,因為物體的定位 在不是非常準確的情況下,這個類神經網路仍能有效處理。 我們將這個網路應用在唇語辨識上,實驗結果顯示這個網路比傳統的時間 延遲網路構成的辨識系統有較佳的學習能力與辨識能力。 The researches of the motion recognition has received more and more attentions in recent years because the need for computer vision is increasing in many domains, such as the surveillance system, multimodal human computer interface, and traffic control system. Most of the existing approaches separate the recognition into the spatial feature extraction and time domai□□cognition. However, we believe that the information of motion resides in the space-time domain, not restricted to the time domain or space domain only. Consequently, it seems more reasonable to integrate the feature extraction and classification in the space and time domains altogether. We propose a Space-Time Delay Neural Network (STDNN) that can deal with the 3-D dynamic information, such as motion recognition. For the motion recognition problem that we focus in this paper, the STDNN is an unified structure, in which the low-level spatiotemporal feature extraction and space-time recognition are embedded. It possesses the spatiotemporal shift-invariant recognition abilities that are inherited from the time delay neural network (TDNN) and space displacement neural network (SDNN). Unlike the multilayer perceptron (MLP), TDNN, and SDNN, the STDNN is constructed by the vector-type nodes and matrix-type links such that the spatiotemporal information can be gracefully represented in a neural network. Some experiments are done to evaluate the performance of the proposed STDNN. In the moving Arabic numerals (MAN) experiments, which simulate the object'smoving in the space-time domain by image sequences, the STDNN shows its generalization ability on spatiotemporal shift-invariance recognition. In the lipreading experiment, the STDNN recognizes the lip motions by the inputs of real image sequences. It shows that the STDNN has better performance than the existing TDNN- based system, especially on the generalization ability. Although the lipreading is a more specific application, the STDNN can be applied to other applications since no domain-dependentknowledge is used in the experiment.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT840327072
http://hdl.handle.net/11536/60332
顯示於類別:畢業論文