標題: 核甘酸替換模型的變異數估計及其於演化樹選取法的應用
Variance Estimation for Nucleotide Substitution Models with Application to Phylogenetic Tree Selection
作者: 陳葦珊
王秀瑛
Wang, Hsiuying
統計學研究所
關鍵字: 核甘酸演化模型;核甘酸平均替換次數;變異數估計式;演化樹;maximum likelihood 方法;neighbor-joining 方法;UPGMA 方法;nucleotide substitution model;substitution number;variance estimator;phylogenetic tree;maximum likelihood method;neighbor-joining method;UPGMA method
公開日期: 2016
摘要: 本篇論文包含兩個研究主題。第一個主題是對幾個DNA序列演化模型如F81、F84、HKY85和TN93中核甘酸平均替換次數的變異數提出新的估計式。目前在文獻中已經提出的估計式是利用對核甘酸平均替換次數( )做一階泰勒展開式。而在本論文中我們提出了其他三種估計方式,分別是對核甘酸平均替換次數做二階泰勒展開式以及對核甘酸平均替換次數減掉其平均數的平方做一階和二階泰勒展開式。模擬研究結果顯示對核甘酸平均替換次數減掉其平均數的平方 做二階泰勒展開式所得到的估計式較為準確。另外我們也與拔靴法得到的變異數估計值做比較,它與對核甘酸平均替換次數減掉其平均數的平方 做二階泰勒展開式所得到的估計值有差不多的準確率。但是拔靴法需要耗費較多的時間。 第二個主題是利用檢定兩條序列的核甘酸平均替換次數來挑選適當的演化樹。在文獻中已經有提出許多種建構演化樹的方法。但對於各種方法所建構的演化樹,當利用不同的DNA序列演化模型會有不同拓樸結構。所以在本篇論文中,我們就利用檢定兩條序列的核甘酸平均替換次數做為我們挑選的準則來選取較為適合的演化樹。最後以伊波拉病毒為例子來介紹我們的方法。
In this dissertation, we focus on two topics. The first topic is to provide variance estimation for the nucleotide substitution model. The current variance estimators for most evolutionary models were derived when a nucleotide substitution number estimator was approximated with a simple first order Taylor expansion. In this dissertation, we derive three variance estimators for the F81, F84, HKY85 and TN93 nucleotide substitution models, respectively. They are obtained using the second order Taylor expansion of the substitution number estimator, the first order Taylor expansion of a squared deviation and the second order Taylor expansion of a squared deviation, respectively. These variance estimators are compared with the existing variance estimator in terms of a simulation study. It shows that the variance estimator, which is derived using the second order Taylor expansion of a squared deviation, is more accurate than the other three estimators. In addition, we also compare these estimators with an estimator derived by the bootstrap method. The simulation shows that the performance of this bootstrap estimator is similar to the estimator derived by the second order Taylor expansion of a squared deviation. Since the latter one has an explicit form, it is more efficient than the bootstrap estimator. The second topic is to select a suitable phylogenetic tree selection by testing of the substitution number between sequences. Phylogenetic tree is a widely-used tool to show the evolutionary relationship between taxa. There are many types of phylogenetic trees proposed in the literature such as maximum likelihood, neighbor joining and UPGMA trees. The topologies of different types of trees are not exactly the same. Even for the same type tree, the topologies are different when they embed different nucleotide substitution models, such as JC69 model, K80 model, TN93 model and so on. Although each type of trees has its advantage, to select a suitable tree among these choices becomes a challenging problem. In this study, we propose a method based on testing the substitution number between sequences to select trees. An ebolavirus example is used to illustrate and validate the method. In addition, this approach can select a suitable nucleotide substitution model for a particular type of tree. For the ebolavirus example, the JC69 model is the selected substitution model for the maximum likelihood tree, and the TN93 model is the selected substitution model for the UPGMA tree.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT079926802
http://hdl.handle.net/11536/138788
Appears in Collections:Thesis