標題: 嵌在人類長分散子1(LINE-1)中的簡單重覆序列的演化
The evolution of embedded simple repeats in human LINE 1
作者: 李應逵
Li, Ying-Kui
林勇欣
Lin, Yeong-Shin
生物資訊及系統生物研究所
關鍵字: 重複性序列;長轉置子;microsatellite;LINE-1
公開日期: 2010
摘要: 在人類基因體中有大部分是由DNA重複性序列所組成,其中包含了LINE-1和microsatellite。Microsatellite是由十個以下的核苷酸(nucleotides)連續且重覆出現所組成。LINE-1是哺乳類基因體中含量最多的retrotransposon,具有在基因體中移動的能力。目前microsatellite已知會在LINE-1的ORF2中出現,但對其出現的位置、頻率、種類等都尚未明暸。所以我們想要探討microsatellite在LINE-1上真正的位置,以及是否有某個高機率產生microsatellite的區域及其可能的發生原因。並利用人類、黑猩猩及絨猴間的演化關係及時間,找出LINE-1及microsatellite所expansion的確切時間。 結果顯示,我們在LINE-1的ORF2中5701到5800 bp的位置發現有可能是高頻率出現microsatellite的區域。在這個區域中,出現interrupted site的次數大於其他的位置。透過BLAST及alignment得知,在interrupted site周圍的序列的保留程度很高,並且傾向於被限制在一個區域中。為了找到這個區域,我們以逼近的方式取得這個容易產生microsatellite的位置的長度,並將這個位置稱為length-mutation hotspot。Length-mutation hotspot的長度非常不固定,由十數個nucleotide到上千個nucleotide都可能出現,且不同的subfamiliy發生長度改變的機率不盡相同。根據我們的實驗結果,我們認為最原始的LINE-1的length-mutation hotspot長度為16 – 18 bp。分析16 – 18 bp的length-mutation hotspot序列,我們得到此序列的consensus序列,稱為standard length-mutation hotspot sequence。比較standard length-mutation hotspot sequence和已發生長度改變的hotspot序列(length-changed hotspot sequence),就可以得知是哪些位置發生增減。經過觀察length-changed hotspot region我們得知在hotspot區域中有多少數量的microsatellite會產生,並且有三種最主要的microsatellite(TA、TG及CA)。除此,我們更得到了length-mutation hotspot中microsatellite的大量發生的時間比人類和絨猴種化的時間稍早,約為三千五百萬到四千萬年前。我們在人類、黑猩猩及絨猴中觀察到一個直系同源(orthologous)microsatellite的例子,我們也利用dot-plot及序列比對探討其相同異之處。
LINE-1 and microsatellite are classified into repetitive DNA. They occupy genome in a considerable part. Microsatellite is constituted by simple DNA tract less than ten nucleotides which continuously repeat many times. LINE-1 is the most abundant retrotransposon in the mammalian genome, in the genome. At present, studies suggest that the ORF2 (open reading frame 2) of LINE-1produce microsatellite frequently, but not sure the exact location and many early studies for the microsatellite discussed microsatellite length changes in a short time. We want to explore the location of microsatellite prone position (called length-mutation hotspot) in ORF2 in LINE-1. We also used LINE-1 to find microsatellite and thus observe the orthologous microsatellite in humans, chimpanzees and marmosets. The results show that high-frequency microsatellite region we found was allocated at 5701-5800 bp in LINE-1 ORF2. In this region, the number of interrupted site was larger than other regions. Using BLAST and alignment, the sequences around interrupted site were highly consensus and tended to a restrict region. To find this region, we use 2 steps alignment to get the closest approximately regions to generate microsatellite. We did find a region we called length-mutation hotspot regions. The length of this region was not fixed by the dozens to thousands of nucleotides. Each LINE-1 subfamily of length 16 - 18 bp in length-mutation hotspot was the highest. Analysis of 16 - 18 bp of length-mutation hotspot sequence, we got the sequence of the consensus sequence, called as the standard length-mutation hotspot sequence. Compared standard length-mutation hotspot sequence and length-changed hotspot sequences, we can know which position changed repeat number. After observation of length-changed hotspot region we knew how many in the hotspot region will produce the number of microsatellite, and there were three main microsatellites (TA, TG and CA). In addition, we got expansion time microsatellite in length-mutation hotspot occurred slightly larger than speciation of human and marmoset, about 35 – 40 million years ago. At last, we found a example of orthologs of humans, chimpanzees and marmosets and then we use the dot-plot and alignment to compare.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079751508
http://hdl.handle.net/11536/45817
Appears in Collections:Thesis