標題: 大規模直向同源基因的偵測
Large-scale Orthology Detection
作者: 鐘仁駿
Chung, Jen-Chun
林苕吟
盧錦隆
Lin, Tiao -Yin
Lu, Chin-Lung
生物資訊及系統生物研究所
關鍵字: 直向同源基因;旁向同源基因;大規模;高通量;Orthologous genes;Paralogous genes;Large scale;High-throughput
公開日期: 2011
摘要: 基因體快速定序技術的快速發展造成基因體序列資料的數量產生空前的成長。然而,以目前生物實驗的方法來找出基因功能的速率趕不上今日高通量定序技術的速率,使得許多已定序基因體內的基因功能仍未可得知。研究指出在不同物種內的直向同源基因應該有相同的功能。因此,找出直向同源基因在預測已定序基因體內的基因功能是有幫助的。最近有一個稱為QuartetS的方法被提出來執行大規模直向同源基因的偵測。QuartetS的方法為先找到旁向同源基因,接著把那些非旁向同源基因的基因視為直向同源基因。為了找出在不同兩個物種內的兩個基因x和y是否為旁向同源基因,QuartetS先建構由四個基因組成的基因樹,此基因樹為利用這兩個基因x和y以及第三個物種內的兩個旁向同源基因z1和z2所建構而成。QuartetS利用一個近似方法在基因樹中找出樹根的位置。如果預測出的樹根位置在基因樹的內部枝幹上,那麼基因x和y就被視為旁向同源基因。否則,QuartetS利用其他組旁向同源基因z1和z2並且重複上述的步驟。如果全部預先準備的旁向同源基因都不能夠用來證明基因x和y為旁向同源基因,那麼基因x和y就被視為直向同源基因。然而,QuartetS的缺點有2個: (1)其假定物種演化的突變速率是固定的,(2)在基因樹中樹根的位置是利用近似方法估計而來的。在這份研究中,我們對QuartetS做了以下的改良: (1)物種的突變速率沒有假定為固定,(2)我們加入了相對於基因x、y、z1和z2為外群基因的的5個基因o來預測基因樹中樹根的位置。最後,實驗結果顯示從直向同源基因中區別出旁向同源基因的效能方面,我們改良的QuartetS方法確實是比原先QuartetS還要好的。
The rapid development of genome sequencing technology has resulted in an unprecedented growth in the number of the genome sequence data. However, the rate of the current biological experimental methods to identify gene function can’t catch up with the rate of today's high-throughput sequencing technology, leading to that the functions of the genes in many sequenced genomes are still unknown. It has been reported that the orthologous genes in different species should have the same function. Hence, the identification of orthologous genes is helpful to the prediction of gene functions in the sequenced genomes. Recently, a method, called QuartetS, has been proposed to perform large-scale orthology detection. The approach of QuartetS is first to find the paralogous genes, and then consider those genes that are not paralogous as orthologous genes. To determine whether two genes, say x and y, from two different species are paralogous, QuartetS first constructed a quartet gene tree using these two genes and other two paralogous genes, say z1 and z2 from the third species. QuartetS used an method to approximately determine the location of the root in the quartet gene tree. If the predicted root is located in the inner edge of the quartet tree, then x and y are considered as paralogous genes. Otherwise, QuartetS used other pairs of paralogous genes as z1 and z2 and repeated the above procedure. If all pre-prepared pairs of paralogous genes can’t be used to prove that x and y are paralogous, then x and y are considered as a pair of the orthologous genes. However, the shortcomings of QuartetS are that the mutation rate in species evolution is assumed to be constant, and the location of the root in the quartet tree is estimated using an approximate method. In this study, we make the following modifications to improve QuartetS: (i) The mutation rate of species is not assumed to be constant. (ii) The location of the root in the quartet gene tree is predicted by adding the fifth gene o that is a outgroup gene with respect to genes x, y, z1 and z2. Finally, experimental results have shown that the performance of our improved QuartetS method to distinguish paralogous genes from orthologous genes is indeed better than original QuartetS.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079851517
http://hdl.handle.net/11536/48209
顯示於類別:畢業論文


文件中的檔案:

  1. 151701.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。