標題: 用計算方法識別哺乳類動物的基因轉錄啟始點
Computationally Identifying Core Promoter Regions of Genes in Mammalian Genomes
作者: 林在營
Tzai-Ying Lin
Hsien-Da Huang
關鍵字: 轉錄啟始點;啟動子;支撐向量法;DNA穩定性;Transcriptional start sites;Promoter;SVM;DNA stability
公開日期: 2005
摘要: 轉錄是RNA從基因染色體的DNA片段進行複製的過程,其受到啟動子區域的影響而作用。 核心啟動子(Core promoter)是在轉錄起始點(TSS)鄰近約100個基元的區域,轉錄啟始點是轉錄過程的起始點,而準確的定位出核心啟動子的區域是我們理解基因轉錄規則的第一步。 在這篇論文裡,我們提出一種基於DNA穩定性及核□酸分布及機器學習理論的計算方法用以鑑定哺乳類動物基因組的轉錄啟始點。 己知的轉錄啟始點資料是從DBTSS資料庫取得而哺乳類動物的基因組序列則是由NCBI第35版的資料庫裡取得。我們整合了支持向量機器 (Support Vector Machine)來建立預測轉錄啟始點的模型。為了了解我們進行預測的方法的好壞,我們使用了交叉比對(k-fold cross-validation)的方式進行驗證。初步的結果顯示我們的預測方法的準確性達70%以上,而跟其他論文提出的方法進行比較,我們的系統的確較其他方法有較好的效能。
Gene transcription is an extremely important mechanism in the cell, which is regulated by transcription factors (TFs), binding mostly and specifically to the 5’ end of genes, the so called promoter region. The core promoter is a region of about 100 base-pairs flanking the transcriptional start site (TSS), which serves as the recognition site for the basal transcription apparatus. To accurately determine the core promoter in gene upstream is the first step to decipher the regulation of gene transcription. In the study, we incorporated Support Vector Machine (SVM) with three useful regulatory features such as statistically significant 6-mer patterns, nucleotide composition, and DNA stability to identify the transcriptional start sites in mammalian genomes. The experimentally verified transcriptional start sites were obtained from DBTSS, and the genomic sequences of the mammalian genomes were obtained from NCBI build 35. K-fold cross-validation was used to evaluate the prediction performance of the three regulatory features extracted for core promoters, and the preliminary results suggested that the prediction accuracy could be greater than 70%. By comparing to other previously developed approach, our method had better prediction performance than others.


