標題: 以電腦視覺方式分析人與電腦互動應用中的物體姿勢及人體活動
VISION-BASED ANALYSIS OF OBJECT POSES AND HUMAN ACTIVITIES FOR HUMAN-COMPUTER INTERACTION APPLICATIONS
作者: 張欽圳
Chin-Chun Chang
蔡文祥
Wen-Hsiang Tsai
資訊科學與工程研究所
關鍵字: 人與電腦互動;頭部姿勢臉部表情;手姿勢;腳部運動;物體姿勢;Human-computer interaction;Head pose and facial expression;Hand gesture;Leg movements;Object pose
公開日期: 2000
摘要: 數千年來人們一直努力製造智慧型系統來增加生產力及改善生活。這種智慧型系統必須具備有感知人類活動與提供自然回饋方式的能力。由於電腦視覺是一種非接觸性的感測方式,比起其它可用來感測人類活動的方法,電腦視覺對於人類而言是一種更自然、更便利的感測方式。因此發展和研究電腦視覺技術來感知人類的的活動是必需且重要的。 由於人習慣於用頭、表情、手、腳來表示自己的意向與活動,於是我們針對這些主題進行研究並發展出數個新方法來分析這些部位的動作。另外我們也可以在人身上放置特殊的標記,藉由這些標記,我們就能夠得到人準確的方位。目前有許多方法可以計算出標記的方位,但是它們不能知道影像特徵及計算出的方位的品質。在這個研究裡我們也提出一種新方法來確保品質。 在分析頭部方位與臉部表情方面,我們提出四個新的演算法來從單張影像估計頭部方位與臉部表情參數。其中兩個是針對簡化的情況所設計的方法,其解是直接解;另外兩個則是針對一般的情況所設計的疊帶法。其中三個方法是由特徵點的透視投影方程式所推導出來的;另外一個疊帶法則是基於連續比例平行投影近似法來計算。基於連續比例平行投影近似法所設計之疊帶法不需要啟始值,並且由理論分析與實驗結果顯示此方法具有高度的收斂特性。實驗結果顯示所提出的方法皆非常強健及適合用來估計頭部方位與臉部表情參數。 在分析手部姿勢方面,我們提出一個新的系統來從單張影像分析手部姿勢。在這裡我們提出一個兩階段式的方法來估計手部姿勢與手指關節角度。其中手的姿勢是用雷射光打在手背上所形成的光點與推廣式赫弗轉換(the generalized Hough transform)來求得;手指關節角度則是用反向關節計算技巧及動態程式規劃來求得。在有考慮手指之間的關係的情況下,我們所提出的方法的複雜度為O(m2),而傳統窮舉的方式則為O(m12),其中m代表每個關節可能的角度數目。實驗結果顯示所估計到的手部參數可以用來作為三度空間手部的電腦動畫模擬,與手部姿勢辨識之用途。證實了本方法的可行性。. 在分析腳部動作方面,我們提出了一個以單支相機來追蹤與分析腳部動作的系統。在這個系統裡使用者是以他的腳部來控制他在虛擬環境中的動作。藉由追蹤綁在腳上的色帶,我們用一次的馬可夫程序來描述腳部運動狀態的變化。然後藉由密理自動機(Mealy machine)來辨識一連串腳動作的意義。由於密理自動機具有決定性的特性,因此辨認的數度十分快速。實驗結果顯示可以達到每秒14張影像。證實了本方法的可行性。. 以電腦視覺來估計物體姿勢,一個穩定可信賴的系統必須確保其計算出物體姿勢的品質。在這個研究理我們提出兩個統計假說測試函數來達成此功能。首先我們根據線特徵與品質要求來定義一個誤差函數。基於誤差函數的下限值,我們定義第一個測試函數來去掉不好的輸入資料以避免不必要的運算。在計算出物體姿勢後,我們可用第二個測試函數來看計算出的姿勢是否夠精確。實驗結果顯示這兩個測試函數皆達到我們的要求,證實了本方法的可行性。 由實驗結果顯示我們所提出的方法皆為可行,並且它們都可以被應用到人機互動的系統中。
In order to increase productivity and to facilitate every day’s life, scientists and engineers have been trying to build intelligent systems that can interact with human beings via human ways for thousands of years. Such intelligent systems must have the capabilities to analyze human activities and to provide natural feedback. Today, since computer vision is a noninvasive method of sensing, vision-based systems for analyzing human activities are more convenient and friendly for many applications than the other ways of sensing. Hence, technologies of computer vision for analyzing human activities are desired for developing human-computer interaction systems. Since it is natural for humans to conduct intended activities by head poses, facial expressions, hand gestures, and leg movements, we are interested in analyzing these human activities in this dissertation study, and have proposed new methods for analyzing such human activities. In addition, by placing some man-made marks on a human, the activity of the human can be determined by analyzing the motion of these marks. This technique is often used for precise localization. There exist many vision-based techniques of localization but few of them can tell us about the qualities of inputs and estimated results. In this dissertation study, this problem is also investigated. For the analysis of the head pose and the facial expression, four new iterative methods based on the use of single images of human faces are proposed. Two of them are direct methods and designed for simplified cases. The other two are iterative methods for the general case. The two direct methods and one of the iterative methods are derived from the perspective projection equations of the feature points on the human face. The other iterative method extends the concept of successive scaled orthographic approximations to estimate the parameters for the human face. Experimental results show that the proposed methods are robust. Furthermore, the iterative methods are shown to have high percentages of convergence, proving the feasibility of the proposed approach. For the analysis of free hand gestures, a new model-based system for analyzing free-hand gestures from single images by computer vision techniques is proposed. In this study, the orientation and position of the hand, and the joint angles of the fingers and the thumb are estimated separately by two steps. The orientation and position of the hand are estimated first by using sparse range data generated by laser beams and the generalized Hough transform. Next, estimation of the parameters for the joint angles of the fingers and the thumb is regarded as an optimization problem. Possible configurations for the fingers and the thumb are generated by a novel inverse kinematic technique, and the best configurations are found by a new algorithm based on the dynamic programming technique. The estimated parameters are shown suitable for 3-D hand gesture animation by experiments. In addition, the applicability of the proposed system is also demonstrated by a simple hand gesture recognition system. Experimental results show the feasibility of the proposed approach. For the analysis of leg movements, a vision-based system for tracking and interpreting leg motion in image sequences using a single camera is developed for a user to control his movement in the virtual world by his legs. Twelve control commands are defined. The trajectories of the color marks placed on the shoes of the user are used to determine the types of leg movement by a first-order Markov process. Then, the types of leg movement are encoded symbolically as input to Mealy machines to recognize the control command associated with a sequence of leg movements. The proposed system is implemented on a commercial PC without any special hardware. Because the transition functions of Mealy machines are deterministic, the implementation of the proposed system is simple and the response time of the system is short. Experimental results with a 14 Hz frame rate on image resolution are included to prove the feasibility of the proposed approach. To develop a reliable computer vision system, the employed algorithm must guarantee good output quality. In this study, to ensure the quality of the pose estimated from line features, two simple test functions based on statistical hypothesis testing are defined. First, an error function based on the relation between the line features and some quality thresholds is defined. By using the first test function defined by a lower bound of the error function, poor input can be detected before estimating the pose. After pose estimation, the second test function can be used to decide if the estimated result is sufficiently accurate. Experimental results show that the first test function can detect input with low qualities or erroneous line correspondences, and that the overall proposed method yields reliable estimated results. In summary, the conducted experimental results of all the proposed approaches show their feasibility and prove that the proposed systems can be taken as the basis for developing a more effective human-computer interaction system.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT890394004
http://hdl.handle.net/11536/66903
顯示於類別:畢業論文