正規化表示式的限制型多重序列比對之研究

Full metadata record

DC Field	Value	Language
dc.contributor.author	李威勳	en_US
dc.contributor.author	Wei-Hsun Lee	en_US
dc.contributor.author	盧錦隆	en_US
dc.contributor.author	Chin Lung Lu	en_US
dc.date.accessioned	2014-12-12T03:09:24Z	-
dc.date.available	2014-12-12T03:09:24Z	-
dc.date.issued	2006	en_US
dc.identifier.uri	http://140.113.39.130/cdrfb3/record/nctu/#GT009451511	en_US
dc.identifier.uri	http://hdl.handle.net/11536/82003	-
dc.description.abstract	在生物資訊及計算生物領域中，多重序列比對 (Multiple Sequences Alignment) 在發掘基因體或蛋白質序列的生物意義上是很有用的工具。通常生物學家對序列的結構╱功能╱演化關係已有一些初步的認識，如活化部位的殘基、分子間的雙硫鍵、受質結合的部位、酵素的活化性及保守性的 Motifs 等等。因此在做多重序列比對時，生物學家希望有一個工具能讓一些結構性的╱功能性的╱保留性的核甘酸或殘基可以排在一起。 2004年我們的研究團隊已開發出一套限制型多重序列比對 (Multiple Sequence Alignment with Constraints) 的工具叫 MuSiC。至目前為止 MuSiC 已被許多生物學家證實在生物的研究上是相當有用的。然而，MuSiC 中的 Constraint 只能是允許 Mismatches 但不能允許Gap的簡單序列片段。很多生物重要的 Patterns 像是 PROSITE database 中的 Motifs 在 MuSiC 中是無法使用的。因此，在此論文中我們主要的目的為研究並開發出一套能夠使用正規表示式的限制型多重序列比對 (Multiple Sequence Alignment with Regular Expression Constraints) 的演算法與工具。我們採用了 Progressive 的方法來解決正規表示式的限制型多重序列比對的問題。事實上，這個方法的核心在於設計出有效率的演算法來解決正規表示式的限制型兩條序列比對問題 (Pairwise Sequence Alignment with Regular Expression Constraints Problem)。我們是將正規表示式 (Regular Expression) 轉成有限狀態機 (Finite Automaton)，並使用 Dynamic Programming 與 Divide-and-Conquer 方法來設計一個在時間與空間上都有效率的演算法來求得最佳化的正規表示式限制型兩條序列比對。然後，我們再跟據此演算法發展出能夠使用多個正規表示式的限制型多重序列比對工具：RE-MuSiC (Multiple Sequence Alignment with Regular Expression Constraints)，其網址在 http://140.113.239.131/RE-MUSIC	zh_TW
dc.description.abstract	Multiple sequence alignment (MSA) has received much attention in the fields of bioinformatics and computational biology because it is very useful for discovering the biological meanings of sequences. Usually, biologists may have advanced knowledge about the structural, functional, and /or evolutionary relationships about sequences of their interest, such as active site residues, intramolecular disulfide bonds, substrate binding sites, enzyme activities, conserved motifs (consensuses) and so on. They would expect an MSA program that is able to align these sequences such that the structural, functional, and/or conserved bases (i.e., nucleotides or residues) are aligned together. In 2004, our research group has already developed a tool, called MuSiC, for multiple sequence alignment with constraint. Since then, it has been proven by many biologists to be useful in biological research. Nevertheless, the constraints allowed in MuSiC can only be simple substrings allowing mismatches but disallowing gaps in the occurrences. Many biologically important patterns such as motifs in the PROSITE database cannot be supported by MuSiC, either. Hence, in this thesis, we study and develop an algorithm and a tool for the problem of multiple sequence alignment with regular expression constraints (RECMSA). We used a progressive approach to design an efficient program for solving the RECMSA problem. The kernel of this approach is an efficient algorithm for solving the problem of pairwise sequence alignment with regular expression constraints (RECPSA). We transform the regular expressions into a finite automaton and then use dynamic programming and divide-and-conquer approaches to develop a time and space efficient algorithm for optimally solving the RECPSA problem, which can be implemented effectively on a desktop PC with limited memory. Using this algorithm as the kernel, we developed a web-server called RE-MuSiC (Multiple Sequence Alignment with Regular Expression Constraints) that is available on-line at http://140.113.239.131/RE-MUSIC.	en_US
dc.language.iso	en_US	en_US
dc.subject	限制型序列比對	zh_TW
dc.subject	正規化表示式	zh_TW
dc.subject	PROSITE 資料庫	zh_TW
dc.subject	sequence alignment with constraints	en_US
dc.subject	regular expression	en_US
dc.subject	PROSITE database	en_US
dc.title	正規化表示式的限制型多重序列比對之研究	zh_TW
dc.title	On the Study of Multiple Sequence Alignment with Regular Expression Constraints	en_US
dc.type	Thesis	en_US
dc.contributor.department	生物資訊及系統生物研究所	zh_TW
Appears in Collections:	Thesis

Files in This Item:

151101.pdf

151101.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.