標題: MirTS: 運用基因演算法與支持向量機來辨認 miRNA 和其目標基因的工具
MirTS: a software for identifying miRNA target interactions in human using genetic algorithm-support vector machine
作者: 黃紹甄
Huang, Shao-Zhen
黃憲達
Huang, Hsien-Da
生物資訊及系統生物研究所
關鍵字: 微小核醣核酸的標靶基因預測;支持向量機;基因演算法;微小核醣核酸;miRNA target prediction;support vector machine;genetic algorithm;miRNA
公開日期: 2013
摘要: miRNA 是一段長度約22個核苷酸的非轉譯片段,在大部分生物體內扮演基因調控的重要角色,透過與mRNA 3’ untranslated region (3’-UTR) 的結合,造成mRNA的降解和抑制蛋白質轉譯。正因為miRNA的重要性,運用計算式的方法找出miRNA的目標基因是對於我們更進一步了解miRNA和其目標基因的奠基石。 然而現今存在的演算法太過於依賴seed-region的序列互補或是跨物種間的保存,即便後期的演算法已經盡力在改善這些缺點,依舊還存有努力的空間。因此本篇研究的目的在於建立一個可以準確預測miRNA的目標基因的模型。本篇研究以基因演算法結合支持向量機的方式來選擇最佳化的參數和最佳的特徵子集,高通量定序的資料作為訓練集和測試集,再以其他來源的資料來評估及改善模型的表現。另外,透過分析訓練集和測試集的資料以期更了解miRNA和其目標之間的交互作用。 在本篇研究中,我們建立一個最完整且大量的資料集,其中負向資料是透過選擇高相關係數的miRNA和目標基因的配對所組成,也建立了一個可以預測miRNA和其目標基因的模型。
MicroRNAs (miRNAs) are small non-coding RNAs of ~22 nucleotides that play an important role for most organisms through regulating gene expression in the post-transcriptional level. By base-pairing with 3’ untranslated region (3’-UTR) of mRNA, miRNAs function as degradation of mRNAs and repression of translation to achieve gene silencing. Owing to the importance of miRNA, computational prediction of miRNA-mRNA pairs is entry for us to learn the relationship between miRNAs and its targets. However, existing methods strongly focused on the seed-region complementarity or the cross-species conservation. Even though there has been significant progress on miRNA target prediction algorithm, it still have room for improvement. So in this study, we aim to construct a model to predict the miRNA-mRNA interaction with high accuracy. We apply GA-SVM algorithm which combines SVM (support vector machine) and GA (genetic algorithm) to increase prediction accuracy of miRNA target classification through select optimal feature subset. High-throughput datasets of miRNA-mRNA interaction are utilized for training and testing. Furthermore, data coming from miRTarBase were used for testing to improve the performance. The performance of the model is further evaluated by independent set and compare to other algorithms. In addition, the datasets are carried out for further analysis of miRNA-mRNA interaction and characteristics. In conclusion, we constructed a comprehensive dataset that comes from different methods especially negative data which was generated by the expression profiles from TCGA through selecting high Pearson correlation coefficient of miRNA-target pairs. A GA-SVM model was built for miRNA target prediction. Several information about miRNA-target interaction were taken into account and it lay a foundation for researchers to investigate miRNA and target interaction.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070157201
http://hdl.handle.net/11536/75805
顯示於類別:畢業論文