标题: 使用重叠基因建构原核生物的基因体树
Reconstructing Genome Trees of Prokaryotes Using Overlapping Genes
作者: 郑智先
Cheng, Chih-Hsien
卢锦隆
Lu, Chin Lung
生物资讯及系统生物研究所
关键字: 生物资讯;演算法;基因体树;重叠基因;原核生物;基因体重组;bioinformatics;algorithm;genome tree;overlapping gene;prokaryote;genome rearrangement
公开日期: 2008
摘要: 重叠基因被定义为在染色体位置相邻且编码序列内容会部分或全部重叠的两个基因。事实上,重叠基因在微生物的基因体上是非常普遍的,而且他们比非重叠基因在演化上是更具有保留性。基于上述的特性,我们之前已发展出一个网路伺服器的工具称为OGtree,其可以让使用者根据两两原核生物基因体间的重叠基因距离来建构原核生物的基因体树。类似于基因内容与基因次序的研究,我们结合重叠基因内容(即两个基因体之间共有的直向同源重叠基因对的平均数)与次序(即两个基因体之间平均的重叠基因断点距离)定义出两个基因体之间的重叠基因距离。但在利用断点距离来定义重叠基因距离时有一个缺点,即无法将其应用在多染色体的基因体并计算出他们的重叠基因距离。除此之外,对于某些亲缘关系较远的物种,在他们之间能够找到的直向同源重叠基因可能很少,以致于没有足够的直向同源重叠基因可适当地衡量出他们两两之间的重叠基因距离。
因此,在这篇论文中,我们定义了一个新的重叠基因距离,它是根据较有生物正确性的基因重组(例如:翻转、移位与易位)而不是断点所定义出来的,而且它能同时应用在单一染色体与多染色体的基因体上。除此之外,我们也扩展了基因的范围使之同时包含其编码序列与调控区,如此我们可以将两个邻近基因发生编码序列重叠或调控区重叠都视是一对重叠基因。这是因为不同基因若在调控区域发生重叠现象,或多或少会影响这些基因的调控。根据上述的改变,我们将OGtree改版为一个新的网路伺服器叫做OGtree2.0,并且利用二十一条蛋白细菌染色体去建构其演化树并用其结果来衡量OGtree2.0的正确性。最后,我们的实验结果显示OGtree2.0的确比之前的版本OGtree以及另一个相似的工具BPhyOG要来得好,因为OGtree2.0所建构出的演化树,其蛋白细菌之间的亲缘关系与被生物学家所接受的是参考树一致的。
Overlapping genes (OGs) are defined as adjacent genes whose coding sequences overlap partially or entirely. In fact, they are ubiquitous in microbial genomes and more conserved between species than non-overlapping genes. Based on this property, we have previously implemented a web server, named OGtree, that allows the user to reconstruct genome trees of some prokaryotes according to their pairwise OG distances. By analogy to the analyses of gene content and gene order, the OG distance between two genomes we defined was based on a measure of combining OG content (i.e., the normalized number of shared orthologous OG pairs) and OG order (i.e. the normalized OG breakpoint distance) in their whole genomes. A shortcoming of using the concept of breakpoints to define the OG distance is its inability to analyze the OG distance of multi-chromosomal genomes. In addition, the amount of orthologous overlapping coding sequences between some distantly related prokaryotic genomes may be limited so that it is hard to find enough orthologous OGs to properly evaluate their pairwise OG distances.
In this study, we therefore define a new OG order distance that is based on more biologically accurate rearrangements (e.g., reversals, transpositions and translocations) rather than breakpoints and that is applicable to both uni-chromosomal and multi-chromosomal genomes. In addition, we expand the term ”gene” to include both its coding sequence and regulatory regions so that two adjacent genes whose coding sequences or regulatory regions overlap with each other are considered as a pair of overlapping genes. This is because overlapping of regulatory regions of distinct genes suggests that the regulation of expression for these genes should be more or less interrelated. Based on these modifications, we have reimplemented our OGtree as a new web server OGtree2.0 and have also evaluated its accuracy of genome tree reconstruction on a testing dataset consisting of 21 Proteobacteria genomes. Our experimental results have finally shown that our current OGtree2.0 indeed outperforms its previous version OGtree, as well as another similar server BPhyOG, significantly in the quality of genome tree reconstruction, because the phylogenetic tree obtained by OGtree2.0 is greatly congruent with the reference tree that coincides with the taxonomy accepted by biologists for these Proteobacteria.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079651505
http://hdl.handle.net/11536/43263
显示于类别:Thesis


文件中的档案:

  1. 150501.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.