開發基於K-means-pHMM 機器學習演算法之交聯免疫沈澱法高通量定序的泛用分析框架

Full metadata record

DC Field	Value	Language
dc.contributor.author	蕭瓊柏	zh_TW
dc.contributor.author	洪瑞鴻	zh_TW
dc.contributor.author	Hsiao, Chiung-Po	en_US
dc.contributor.author	Hung, Jui-Hung	en_US
dc.date.accessioned	2018-01-24T07:38:03Z	-
dc.date.available	2018-01-24T07:38:03Z	-
dc.date.issued	2016	en_US
dc.identifier.uri	http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070357201	en_US
dc.identifier.uri	http://hdl.handle.net/11536/139477	-
dc.description.abstract	核糖核酸結合蛋白(RNA-binding Proteins, RBPs)在生物體內扮演重要的角色，核糖核酸(RNAs)轉錄後修飾(Post-transcriptional regulation)的行為，都需要核糖核酸結合蛋白的協助。近年來發展了CLIP-Seq(Cross-linking immunoprecipitation high-throughput sequencing)的實驗技術，來協助研究核糖核酸結合蛋白與核糖核酸的關係。CLIP-Seq是使用紫外光照射細胞，加強核糖核酸與核糖核酸結合蛋白的交聯(cross-linking)，再利用免疫沈澱法(Immunoprecipitation, IP)抓取核糖核酸結合蛋白，最後萃取核糖核酸結合蛋白上的核糖核酸進行高通量定序。當抓取核糖核酸結合蛋白為Agonaute(AGO)時，由於AGO會與小分子核糖核酸(microRNAs, miRNAs)形成核糖核酸誘導沈默複合體(RNA-induced silencing complex, RISC)，我們不僅萃取到核糖核酸的序列，也得到了許多小分子核糖核酸(microRNAs, miRNAs)。現今出現了許多種CLIP-Seq實驗：有HIT-CLIP、PAR-CLIP、iCLIP。目前缺乏一個泛用的分析框架，提供尋找核糖核酸結合蛋白與核糖核酸的結合點位的功能，也支援小分子核糖核酸與核糖核酸結合關係的預測，且支援現存各種類的CLIP-Seq技術。此篇論文，我們提出一個核心為K-means-pHMM的CLIP分析流程，具有高度泛用的特性，能分析HIT-CLIP、PAR-CLIP、iCLIP這三種CLIP次世代定序資料。我們進行模擬測試證明了我們的非監督式機器學習演算法的數學收斂性相當迅速，最後也收集了多筆NCBI CLIP-Seq資料，重新分析並觀察到符合過去研究的分子生物現象。	zh_TW
dc.description.abstract	RNAs are regulated by RNA-binding proteins (RBPs) that bind to the single- or double- stranded RNAs in cells. RBPs bind RNAs and function as ribonucleoprotein complexes and involve in splicing (e.g., U1 snRNP), RNA editing (e.g., ADAR), polyadenylation (e.g., CPSF), mRNA localization (e.g., ZBP1), post-transcriptional regulation (e.g., miRNA-RISC), etc. To understand the relationship between the RBPs and RNAs, the cross-linking immunoprecipitaion followed by next generation sequencing (CLIP-Seq) method is developed. There are currently three major variants of CLIP-Seq based methods, HIT-CLIP, PAR-CLIP, iCLIP. Many algorithms have been proposed to define the binding sites, nevertheless, these methods can be applied to just one or a few CLIP-Seq variants and the results are hard to integrate and compare. In this work, we propose a universal algorithm, GLIP, can be applied to all three CLIP-Seq variants with powerful performance and efficiency.	en_US
dc.language.iso	zh_TW	en_US
dc.subject	高通量定序	zh_TW
dc.subject	CLIP-Seq	zh_TW
dc.subject	核糖核酸結合蛋白	zh_TW
dc.subject	小分子核糖核酸	zh_TW
dc.subject	非監督式機器學習	zh_TW
dc.subject	Profile 隱藏馬可夫模型	zh_TW
dc.subject	NGS	en_US
dc.subject	CLIP-Seq	en_US
dc.subject	HIT-CLIP	en_US
dc.subject	PAR-CLIP	en_US
dc.subject	iCLIP	en_US
dc.subject	RNA-binding protein	en_US
dc.subject	Profile HMM	en_US
dc.subject	Machine learning	en_US
dc.title	開發基於K-means-pHMM 機器學習演算法之交聯免疫沈澱法高通量定序的泛用分析框架	zh_TW
dc.title	A General CLIP-Seq data analysis framework based on a K-means-pHMM learning and clustering algorithm	en_US
dc.type	Thesis	en_US
dc.contributor.department	生物資訊及系統生物研究所	zh_TW
Appears in Collections:	Thesis