標題: 開發生物資訊新工具去探索RNA病毒序列、結構、功能與演化---以流感病毒與腸病毒為應用模型-含有假結的RNA二級結構預測與其在偵測程序性核糖體移碼基因的應用
Prediction of RNA Secondary Structure with Pseudoknots and Its Application in Detecting Programmed Ribosomal Frameshifting Genes
作者: 盧錦隆
LU CHIN LUNG
國立交通大學生物科技學系(所)
關鍵字: 演算法;生物資訊;生物資料庫;比較基因體學;RNA二級結構;假結;Algorithms;Bioinformatics;Biological database;Comparative genomics;RNA secondary structure;Pseudoknot
公開日期: 2009
摘要: 幾乎在所有自然存在的RNA結構中皆可發現所謂的假結(Pseudoknot),其在各種的生物反應中如RNA的複製、轉錄與轉譯等等扮演著非常重要的角色。例如他們可以擔任程序性核糖體移碼(Programmed Ribosomal Frameshifting)的刺激物。所謂的程序性核糖體移碼乃是一種重新編碼(Recoding)的機制,而且經常在很多的RNA病毒(如SARS冠狀病毒)上被發現。病毒藉由這個機制,可讓正在進行轉譯的核糖體在某一個特定的位置上從原來的讀碼框(Open Reading Frame)切換到新的讀碼框(如-1或+1讀碼框,其中切換到-1讀碼框者佔大多數),結果產生另一個新的蛋白質,其不同於原來不發生移碼時所產生的蛋白質。據文獻的報導對於利用此機制的病毒,即便很小地改變其移碼效率就可以抑制其繁殖,這意味著程序性核糖體移碼位置可充當抗病毒藥劑的標的。目前為止,被發現的RNA假結大多數是所謂的H型結構,基本上它是一個髮夾(Hairpin)的結構,其中在髮夾迴圈(Hairpin Loop)的核苷酸會和髮夾外面的核苷酸配對並形成一個靠近或非常靠近髮夾的螺旋主幹(Helical Stem)。事實上,較複雜的H型假結也曾被發現,比如說在SARS冠狀病毒上擔任程序性核糖體移碼刺激物的假結就比平常的H型假結多出一個主幹出來。因此,預測這些H型甚至更複雜的假結結構將有助於我們對RNA結構與其功能的了解。在一般標準的熱動力模型(Thermodynamic Model)上,不含假結的最小自由能量(Minimum Free Energy)的RNA二級結構是可以在多項式時間(Polynomial Time)內被計算出來。然而,要去計算含有假結的最小自由能量的RNA二級結構便變成了一件非常困難的事情,因為它已被證明出是件NP-hard的問題。目前已有許多的多項式時間的演算法被設計出來可以用來計算出含有特定型態的假結(如H型假結)的最小自由能量的RNA二級結構。然而,這些演算法的時間與空間(即記憶體)複雜度仍非常的高,故尚不能實際地被拿來處理較長甚至大型的RNA序列。除此之外,這些演算法也不見得能夠偵測出在一條長的RNA序列中所含的H型假結。過去,我們已經提出一個啟發式(Heuristic)的方法可以改進這些演算法去預測H型的假結,同時也把它應用到-1和+1型程序性核糖體移碼的偵測。然而,這個啟發式方法仍有很多可以再被改進與增強的地方,比如說降低其執行時間與所需記憶體的複雜度、增強使其能處理大型序列的能力、改進其預測的準確度、增強使其能夠去預測較複雜的RNA假結結構與偵測其它型態的程序性核糖體移碼(如-2與+2型的程序性核糖體移碼)。在這個子計劃中,我們主要的目的是利用比較基因體學、結構生物資訊學與功能生物資訊學的方法去發展出新的演算法與啟發式方法來進行上述的改良與增強,同時也將去發展出含有假結的RNA二級結構資料庫與程序性核糖體移碼的資料庫。因此,為了完成這些目標我們需要和其他的子計劃成員一起合作,因為他們能夠提供給我們一些在他們研究領域上的專業知識與技術,以及一些在他們子計劃中預期能夠發展出來的相關程式。同時,我們子計劃發展出來的程式與資料庫也能夠幫助其他子計劃成員去研究RNA病毒,例如流行感冒病毒與腸病毒等,並進一步地去探討這些RNA病毒的序列、結構、功能與演化的關係。
RNA pseudoknots are found in almost all classes of naturally occurring RNAs and play very important roles in a variety of biological processes, such as RNA replication, transcription and translation. For example, they can serve as stimulators in the so-called programmed ribosomal frameshifting (PRF) that is a recoding mechanism commonly observed in many RNA viruses (such as SARS-CoV). By PRF, the translational ribosome switches from the initial (zero) reading frame to one of the two alternative reading frame (either -1 mostly or +1) at a specific position and as a result, produces an alternative protein that are different from that produced by standard translation. It has been reported that for RNA viruses that use PRF, even small alternations in the efficiencies of PRF can inhibit their propagation, suggesting that the PRF sites may present a potential target for antiviral therapeutics. The majority of pseudoknots that have been described to date are of the so-called H-type pseudoknot in which nucleotides from a hairpin-loop pair with a single-stranded region outside of the hairpin to form a helical stem that is adjacent or almost adjacent to the hairpin stem. In fact, more complicated H-type pseudoknots have also been reported, such as the one, functioning as a stimulator of PRF in SARS-CoV, has an additional stem formed in long loop. Therefore, prediction of these H-type pseudoknots, as well as even more complicated pseudoknots, can improve our understanding of RNA structures and their associated functions. In the standard thermodynamic model, a pseudoknot-free RNA secondary structure of minimum free energy (MFE) can be computed in polynomial time. However, when general pseudoknots are allowed in the RNA secondary structure, the computation becomes intractable since it has been shown to be NP-hard. Currently, several polynomial-time algorithms have been proposed to find an MFE secondary structure with a restricted class of pseudoknots (containing H-type pseudoknots). However, they are not yet practical for large-scale sequences due to their high running time and/or space. In addition, these algorithms may not be effective to detect an H-pseudoknot that is actually present in the native structure of a long RNA sequence. In the past, we have proposed a heuristic approach to improve the prediction of H-type pseudoknots for these algorithms and have also applied it in the detection of the -1/+1 PRF genes. However, there is still much improvement and enhancement that can be made regarding their time and space complexities, capability in dealing with large-scale sequences, prediction accuracy, capability in predicting more complicated pseudoknots and other types of PRF (such as -2 and +2 PRFs) etc. In this component project, we aim to develop new algorithms and heuristics for such improvement and enhancement based on the approaches of comparative genomics, structural and functional bioinformatics and also to develop databases of RNA secondary structure with pseudoknots and PRF genes. For its success, therefore, we need to work together with other team members who will provide expertise from their research areas and related programs to be developed in their component projects. At the same time, the programs and databases to be developed in this component project will help our team members involved in other component projects to study RNA viruses, such as influenza viruses and enteroviruses, and further explore the relationships between their sequences, structures, functions and evolution.
官方說明文件#: NSC97-2221-E009-081-MY3
URI: http://hdl.handle.net/11536/101123
https://www.grb.gov.tw/search/planDetail?id=1748112&docId=297767
顯示於類別:研究計畫