標題: 整合基因表現圖譜與結構蛋白質交互作用網路進行全面性網路分析及其應用
Integrating genome-wide expression profiles and structural protein-protein interactions for comprehensive network analysis and applications
作者: 黃星翰
楊進木
Huang, Sing-Han
Yang, Jinn-Moon
生物資訊及系統生物研究所
關鍵字: 結構蛋白質交互作用;基因表現圖譜;生物標記;功能性模組;蛋白質交互作用網路;Structural protein-protein interaction;Gene expression profile;Biomarker;Functional module;Protein-protein interaction network
公開日期: 2017
摘要: 整合蛋白質交互作用(protein-protein interaction, 簡稱PPI)網路以及基因表現圖譜(genome-wide expression profiles)是理解細胞行為和疾病機制的關鍵步驟。在過去幾年中,許多方法已經被提出來建立蛋白質交互作用網路。儘管如此,這些方法大多將蛋白質交互作用網路視為靜態,並不考慮由於基因表現所調控之蛋白質交互作用網路產生的動態特性。因此,蛋白質交互作用網路與基因表現圖譜的整合為蛋白質交互作用網路的動態組成提供了一些重要的見解。然而這些動態網路(dynamic networks)無法提供詳細的原子結合模型(detailed atomic binding models)來反映蛋白質交互作用的結合機制,並且無法描述蛋白質突變與疾病之間的關係。因此,建立具有詳細原子結合模型的動態蛋白質交互作用網路來揭示分子和疾病機制乃是趨勢所需。 針對這些議題,我們提出了整合具結構解析之蛋白質交互作用網路(structurally resolved PPI network)與基因表現圖譜的新框架。首先,我們提出了一種稱為共識相互信息(Consensus Mutual Information, 簡稱CoMI)的新方法和一種新的統計學方法(CoMI標準化分數, 簡稱 ZCoMI),用於分析基因表現圖譜,並鑑定在特定疾病中顯著表現的基因。此外,我們建立了具有詳細的原子結合模型和疾病相關突變的人類三維結構蛋白質交互作用網路(human three-dimensional structural PPI network, 簡稱hDiSNet),以了解疾病相關蛋白質的突變及致病機制。為了系統地分析蛋白質交互作用網路結合基因表現圖譜,我們提出了模組化結構矩陣(modularity structure matrix, 簡稱MS-matrix)以尋找出在蛋白質交互作用網路中的樞紐(hub)、功能性模組(functional modules)以及動態蛋白質交互作用網路局部和全局(local and global)之間的關係。 在本論文中,我們將CoMI和ZCoMI測試在乳癌、血癌及腦癌的基因表現圖譜,以鑑定滿足理想生物標記的基因,結果顯示我們所鑑定的基因符合理想生物標記的條件,並可根據整體存活資料區分癌症亞型和高風險患者。此外,我們所構建的hDiSNet是一個無尺度網路,由5,177個蛋白質和19,239個蛋白質交互作用所組成,並具有5,843個蛋白質突變資訊。我們的研究結果指出疾病相關的突變通常位於蛋白質的交互作用結構域(interacting domain)及接觸殘基(contacting residue),並同時形成氫鍵或保留殘基(conserved residue)。於實際應用上,hDiSNet提供了ErbB訊號傳遞路徑(ErbB signaling pathway)中蛋白質突變解釋腦癌致病機制的見解。另外,我們的研究結果顯示MS-matrix所識別的蛋白質經常是蛋白質交互作用網路中的動態樞紐或靜態樞紐(date or party hub),且功能性模組中的成員彼此間具有生物功能相似性以及相似的基因表現特徵。總結上述結果,我們認為本論文所提出的這些概念和方法有助於鑑定候選生物標記,同時還提供了揭示疾病相關突變機制的機會。
One of the crucial steps toward understanding the cellular behavior and disease mechanisms is to investigate the integrations of protein-protein interaction (PPI) networks and genome-wide expression profiles. In the past few years, many methods have been proposed to construct PPI networks. Nevertheless, most of them considered the PPI networks as static graphs, and lack of dynamic properties that arise as a result of gene expression processes that regulate the expression of proteins in the network. Additionally, the integration of PPI networks with gene expression profiles has provided some important insights into the dynamic organization of the PPI network. These dynamic networks are often unable to provide detailed atomic binding models to reflect the binding mechanisms of PPIs and cannot describe the relationship between mutations and diseases. Therefore, the construction of dynamic PPI networks with detailed atomic binding models is urgently required to reveal molecular and disease mechanisms. To address these issues, we proposed a new framework for the integrations of structurally resolved PPI network and gene expression profiles. We first proposed a new method, called Consensus Mutual Information (CoMI), and a new statistical measure (ZCoMI) for analyzing genome-wide expression profiles and identifying differentially expressed genes in the specific disease. In addition, we constructed a human three-dimensional (3D) structural PPI network (hDiSNet) with the detailed atomic binding models and disease-associated mutations for understanding the mechanisms of disease-associated proteins and their mutations. For systematically analyzing PPI network combining with genome-wide expression profiles, we proposed a modularity structure matrix (MS-matrix) to identify hub proteins and functional modules, as well as to represent local and global relationships in a dynamic PPI network. In this thesis, we have tested CoMI and ZCoMI in the gene expression profiles of breast, leukemia and brain cancers to identify significant genes. The results indicate that our identified genes satisfy the criteria of ideal biomarkers and are able to distinguish cancer subtypes and the high-risk patients based on overall survival. The hDiSNet is a scale-free network and consists of 5,177 proteins and 19,239 PPIs with 5,843 mutations. Our hDiSNet shows that the disease-related mutations are often located at the interacting domains and contacting residues which form the hydrogen bonds or are conserved in the PPI family. As an example for applications, hDiSNet provides the insights of the mutations in the ErbB signaling pathway for interpreting the mechanisms in brain cancer. Moreover, our results show that MS-matrix can identify and reflect the properties of the date and party hubs as well as the protein members of functional modules sharing the functional similarity and similar expression patterns. We believe that our strategies and methods are useful for identifying candidate biomarkers and provide opportunities for revealing the mechanisms of disease-related mutations.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070057202
http://hdl.handle.net/11536/142654
Appears in Collections:Thesis