情緒偵測、基因表現分群以及生物網路重建之統計方法

標題:	情緒偵測、基因表現分群以及生物網路重建之統計方法 Statistical Approaches for Emotion Detection, Gene Expression Clustering and Biological Pathway Reconstruction
作者:	闕棟鴻 Tung-Hung Chueh 盧鴻興 Henry Horng-Shing Lu 統計學研究所
關鍵字:	情緒偵測;機器學習;多變量變異數分析;生物晶片;路徑分析;布朗網路;Emotion recongition;machine learning;MANOVA;Microarray;pathway;Boolean network
公開日期:	2007
摘要:	本論文主要是利用統計在三個不同的研究上的應用，包括情緒偵測、基因表現分群以及生物網路重建。在第一個研究中，我們致力於發展一種情緒偵測的系統。在人類與電腦的聯繫以及溝通上，發展一種裝置可以辨別人類的情緒狀態，將會是相當具有價值的，在此研究中，我們收集受試者在三種不同的情緒狀態下的生理訊號，包含心電圖、皮膚表面溫度以及皮膚表面電阻，並從中取出三十個特徵值。在藉由多變量變異數分析去除掉因為不同天測量所產生的雜訊後，我們使用六種機器學習的方法來辨別情緒的狀態，最後發現使用邏輯迴歸法即可達到最佳的分辨準確率，同時我們發現在使用多變量變異數分析去除掉每天的雜訊後，即可有效的改善這六種分類方法的準確率。　　在本篇論文的第二個研究中，我們藉由生物基因的實驗來探討酵母菌在實驗以及野生品種其在進入發酵生活轉到呼吸生活時基因的表現。同時，我們研究在這兩個品種中，表現的不同基因。在使用基因過濾，分群分析以及迴歸模型來偵測此兩個品種擁有不同表現的基因後，我們發現有一群的基因其在野生及實驗品種的表現呈現有負相關的情況，同時，在這群的資料中，其基因的顯著表現的時間比起葡萄糖濃度的下降時間早了一個小時。在我們後續的研究當中，將可利用例如網路分析等工具來研究這種有趣基因其因果的關係。　　在生物資訊的研究中，從基因表現的趨勢來推論基因控制網路以及生物的因果路徑是相當重要的一個研究。在本篇論文的第三個研究中，我們提出了一個時間延遲布朗網路來探究生物網路。我們假設每個基因最多是受到k個基因所影響，同時在推論時，我們假設k=2，此外，我們在布朗方程式以及受影響的基因之間，我們考慮兩種關係：相似性以及必要性。在我們推論的方法中，我們將每一個輸出的基因以及成對的輸入基因與八個基本的關係做比較，並且計算其p分數，我們預期p分數愈小者，代表其之間的關係愈可能存在，我們將收集所有一致的關係，並找出其最可能出現的關係。最後我們將使用一個模擬的資料例子以及一個真實的醏母菌基因網路關係來進行分析，其結果呈現，我們所提出的基因網路重建方法可以有效的重建出原本的網路模型。　　This thesis consists of three different researches in the implement of statistical approaches, emotion detection, gene expression clustering and biological pathway reconstruction. In the first research area, we focus on developing an emotion recognition system by the supervised learning. For the importance of communication between human and machine interface, it would be valuable to develop an implement which have the ability to recognize emotion. We propose an approach which can deal with the daily dependence and personal dependence in the data of multiple subjects and samples. Thirty features were extracted from the physiological signals of subject for three statuses of emotion. The physiological signals measured were: electrocardiogram (ECG), skin temperature (SKT) and galvanic skin response (GSR). After removing the daily dependence and subject dependence by the statistical technique of MANOVA, six machine learning including Bayesian network learning, naive Bayesian classification, SVM, decision tree of C4.5, Logistic model and K-nearest-neighbor (KNN) were implement to differentiate the emotional states. The results show that Logistic model gives the best classification accuracy and the statistical technique MANOVA can significant improve the performance of all six machine learning methods in emotion recognition system. 　　In the second part of this thesis, we explore the expression pattern of yeast genes for diauxic shift in BY and RM strains by Micorarray studies. In particular, we investigate the differential expressed genes between these two strains. After performing gene filtering, cluster analysis and regression model to detect the differential expression patterns of yeast genes for diauxic shift in BY and RM strains, we find a group of genes which have negative correlation in two strains. Besides, the estimated time shifts of expression time profiles in the group are mainly 1 hour before the time that glucose consumption drops. Further analysis such as network analysis could be used to investigate the causal relationship of these interesting genes based on the framework of current result in the future. 　　Inference of genetic regulatory networks and biological pathways from gene expression patterns is a critical problem in bioinformatics. In the third part of this thesis, we propose using the structure of Time Delay Boolean networks as a tool for exploring biological pathways. We suppose the indegree of each gene (i.e., the number of input genes to each gene) is bounded by a constant K and take K = 2 for the instance of inference. In addition, we consider two kinds of relations between the output gene and the Boolean function with input genes: similarity and prerequisite. In our inference strategy, we compare every output gene and all the pairs of input genes with the eight basic relations and calculate their corresponding p-score. Since we expect that the smaller the p-score, the more likely the relation, we combine those consistent relations and find out the most possible relation between output gene and the pair of input genes. We illustrate the method using a simulated example and a published microarray expression dataset of yeast Saccharomyces cerevisiae from experiments with regulation of gluconeogenesis by Cat8 and Sip4. The results show that our proposed algorithm is extensible for more realistic network models.
URI:	http://140.113.39.130/cdrfb3/record/nctu/#GT009126802 http://hdl.handle.net/11536/55623
顯示於類別：	畢業論文

文件中的檔案：

680201.pdf

若為 zip 檔案，請下載檔案解壓縮後，用瀏覽器開啟資料夾中的 index.html 瀏覽全文。