標題: | 建立蛋白質資料庫與知識庫 Construction and Implementation of Protein Database and Knowledge Base |
作者: | 顧世彥 Shih-yen Ku 胡毓志 楊維邦 Yuh-Jyu Hu Wei-Pong Yang 資訊科學與工程研究所 |
關鍵字: | 蛋白質;資料庫;知識庫;資料採礦;結構分析;字母表述;kmean分群法;SOM分群法;protein;SOM;kmeans;knowledge base;datamining;database;structural alphabets |
公開日期: | 2004 |
摘要: | 本篇論文最主要的是提供一個範例,這個範例是建構我們自己特有的蛋白質資料庫,並且發展我們自己一套資料採礦的方法去建構出我們自己特有的蛋白質知識庫.在本篇論文裡,我們利用我們發展的一套組合式方法(SUM-K)去找出蛋白質的基本結構並將其轉換成一套足以代表蛋白質結構特性的字母系統.利用這樣具有結構特性的字母系統,我們可以下去進行結構相似度分析,並且搭配利用1D排比的工具,如此可以快速的比對出結構相似度高的蛋白質.我們也針對SCOP 蛋白質資料做了一系列的實驗,實驗驗證了我們字母系統優於其他字母統且我們所提出的方法(SUM-K)不但可行而且可以找到最能代表蛋白質結構的結構字母轉換系統.我們也將轉好的字母系統存到了知識庫中,另外我們也提供了網路介面給使用者來分析自己有興趣的蛋白質. The purpose of this thesis is providing an example of constructing our protein database and developing the combinatorial data mining approach to construct our protein knowledge base. In this thesis, the combinatorial approach (SUM-K) found the basic building blocks of protein structure and defined the structure alphabet (SA). The structure alphabet can represent the structural information of protein and transform the original sequences into sequences of structure alphabet with near-neighborhood assignments. The transformed sequences can be measured the similarity of protein structures with 1D alignment tools and fast found high structural similarity one. We took the proteins of SCOP database and do the serial experiment. The results have shown that our combinatorial approach (SUM-K) can define the more proper structure alphabet system than the others. Finally, the transformed sequences of proteins have been saved into our protein knowledge base. Besides, the web-based analytical interface have been set up and provided users to analyze the proteins they interest in. |
URI: | http://140.113.39.130/cdrfb3/record/nctu/#GT009223622 http://hdl.handle.net/11536/76672 |
顯示於類別: | 畢業論文 |