建構可解讀模糊規則知識庫來預測與分析DNA結合的蛋白質

標題:	建構可解讀模糊規則知識庫來預測與分析DNA結合的蛋白質 Establishing an Interpretable Fuzzy-Rule Knowledge Base for Predicting and Analyzing DNA-Binding Domains
作者:	黃慧玲 Hunag Hui-Ling 國立交通大學生物科技學系（所）
關鍵字:	去氧核醣核酸結合區域;特徵選擇;基因演算法;支援向量機;模糊邏輯規則;知識擷取;物化特性;特異位置分數矩陣;蛋白質功能預測;DNA-binding domains;feature selection;genetic algorithm;support vector machine;fuzzy rules;knowledge acquisition;physicochemical properties;position specific scoring matrix;protein function prediction
公開日期:	2011
摘要:	DNA 結合區域/蛋白質是在細胞中扮演著各種生物必須功能的重要蛋白質，例如DNA 的轉錄。最近我們發表一篇論文以支持向量分類器為基礎的高準確度預測DNA 結合區域的預測方法，此研究在生物資訊研究領域中是一項重要的課題。然而，大部分目前已知使用向量支持分類器為基礎的方法中，都使用到大量的特徵及數值作為分類依據，這些方法雖然具有良好的分類效能，但對於所學習的特徵資料，卻無法提供良好的解讀性。本計畫將會集中研究在建立可解讀的模糊邏輯規則，以增進預測和分析的DNA 結合區域分類的知識。本研究計畫過程分為五個階段：1)利用機械學習方法來區別有結合及不會結合的區域中，所具有富含物化特性的資訊。2)提出一個運用物化特性為特徵的演化式模糊規則分類器，此分類器會收集可解讀的模糊規則。3)建立一套可解讀的模糊規則，用以作為預測及分析DNA 的區域的知識庫。4)設計一個利用知識庫所建立的模糊規則統整式分類器。5)確認DNA 結合區域知識庫，確認方法如下：a)分析DNA 結合蛋白的結構，b)分析已知的DNA 結合位置，以及c)由實驗文獻探究已知的會影響DNA 結合的物化特性。為驗證本計畫所提出的方法，我們已得到的正面初步結果有：1)將會得到一組物化特性的組合，2)建立一套演化式模糊規則分類器的原型，以及3)一組簡潔又帶有知識且具有高度預測性的模糊規則。而本計劃最重要且最具有貢獻的任務就是驗證所提出的知識規則庫，並且提供一套用以預測及分析DNA 結合區域/蛋白質的系統。這套系統也同時可以利用可解讀的模糊規則以解釋預測的結果，並同時輸出決策規則。 DNA-binding domains/proteins are functional proteins in a cell, which plays a vital role in various essential biological activities, such as DNA transcription. Recently, we published an accurate support vector machine (SVM) based method for predicting DNA-binding domains which is an important topic in bioinformatics researches. However, most of existing methods used SVM with many features of real values as a classifier which are good at prediction but not at human interpretability. This project aims to establish an interpretable fuzzy-rule knowledge base for predicting and analyzing DNA-binding domains. The research procedure consists of five stages as follows. 1) Identify informative physicochemical properties of sequences by way of distinguishing binding and non-binding domains using a machine learning approach. 2) Propose an evolutionary fuzzy-rule classifier to collect interpretable fuzzy rules based on the identified physicochemical properties. 3) Establish an interpretable fuzzy-rule knowledge base for predicting and analyzing DNA-binding domains. 4) Design an ensemble classifier for prediction using the fuzzy rules of the knowledge base. 5) Validate the DNA-binding knowledge by a) analyzing structures of DNA-binding proteins, b) analyzing the known DNA-binding sites, and c) investigating known physicochemical properties of affecting the DNA binding by experiments from literature. For validating the proposed approach, we have some positive preliminary results: 1) a set of potential physicochemical properties, 2) a prototype of the evolutionary fuzzy-rule classifier, and 3) a feasible, compact and accurate set of fuzzy rules. The most important task is to carefully validate the knowledge base and provide a system for predicting and analyzing DNA-binding domains/proteins. The system can output the decision rules using the interpretable fuzzy rules to explain the prediction results.
官方說明文件#:	NSC100-2221-E009-130
URI:	http://hdl.handle.net/11536/99221 https://www.grb.gov.tw/search/planDetail?id=2342550&docId=369355
顯示於類別：	研究計畫

Files in This Item:

1002221E009130.PDF

若為 zip 檔案，請下載檔案解壓縮後，用瀏覽器開啟資料夾中的 index.html 瀏覽全文。