標題: | 生物序列資料庫區域相似性搜尋引擎:SimSearcher SimSearcher: A Local Similarity Search Engine for Biological Sequence Databases |
作者: | 蔡天浩 Tian-Haw Tsai 李素瑛 Suh-Yin Lee 資訊科學與工程研究所 |
關鍵字: | 生物資訊;序列相似性;區域相似;搜尋引擎;序列叢聚;bioinformatics;sequence homology;local similarity;search engine;sequence clustering |
公開日期: | 2002 |
摘要: | 在生物序列資料庫中尋找與查詢序列相似的序列,是生物資訊學上一個重要的研究課題。為此,我們發展提出了一套相似性搜尋引擎,為了提高其搜尋速度,我們採用了資料探勘(data mining)的技術,先將生物序列資料庫中有的常見樣式(common pattern)擷取出來,然後在搜尋的時候,用這些樣式來篩選濾除大部分的資料庫區域。更精確得說,當使用者輸入查詢序列時,我們檢查查詢序列中是否含有資料庫中的常見樣式,如果有,我們才將資料庫中含有出現在查詢序列中樣式的序列取出,與查詢序列進行一對一的相似度計算,然後回報結果給使用者。
系統的主要概念與架構乃是參考自IBM所提出的DELPHI方法,然而我們在系統部分的設計上進行了一些改變,並且加入了自己的想法。此外為了提升系統的效能,我們也在本篇論文中提出一個全新的序列叢聚(sequence clustering)方法。最後,我們以實驗來分析我們所提出序列叢聚演算法的特性,並評估證明系統的搜尋效率及精確性。除了對實驗的結果進行分析討論外,我們也介紹系統未來尚待解決的課題。 Due to the tremendous growth of biological sequence databases, the development of a local similarity search engine that can perform efficient and effective retrieval task is indispensable. In this thesis, an efficient local similarity search engine is developed exploiting some techniques of data mining. First of all, all frequent patterns in the database are retrieved and recorded in a one-time preprocessing process. Then a query sequence is checked for whether any pattern from the preprocessing stage is matched to the query. Two regions coming from the query and a database sequence that both match to a pattern form a possible seed for the local similarity. Finally, we extend and score each such seed region pair to see whether there really exists a local similarity with a score high enough for reporting. For computational efficiency, a novel clustering approach is proposed and is integrated into the proposed system, which is based on the local similarity search engine - DELPHI system proposed by IBM. Extensive experiments are demonstrated to show the performance of our system. Having the proposed system architecture, one can perform local similarity searching in an accurate and efficient way. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#NT910392020 http://hdl.handle.net/11536/70091 |
顯示於類別: | 畢業論文 |