標題: 建構可解讀模糊規則知識庫來預測與分析DNA結合的蛋白質
Establishing an Interpretable Fuzzy-Rule Knowledge Base for Predicting and Analyzing DNA-Binding Domains
作者: 黃慧玲
Hunag Hui-Ling
國立交通大學生物科技學系(所)
關鍵字: 去氧核醣核酸結合區域;特徵選擇;基因演算法;支援向量機;模糊邏輯規則;知識擷取;物化特性;特異位置分數矩陣;蛋白質功能預測;DNA-binding domains;feature selection;genetic algorithm;support vector machine;fuzzy rules;knowledge acquisition;physicochemical properties;position specific scoring matrix;protein function prediction
公開日期: 2011
摘要: DNA 結合區域/蛋白質是在細胞中扮演著各種生物必須功能的重要蛋 白質,例如DNA 的轉錄。最近我們發表一篇論文以支持向量分類器為基礎 的高準確度預測DNA 結合區域的預測方法,此研究在生物資訊研究領域中 是一項重要的課題。然而,大部分目前已知使用向量支持分類器為基礎的 方法中,都使用到大量的特徵及數值作為分類依據,這些方法雖然具有良 好的分類效能,但對於所學習的特徵資料,卻無法提供良好的解讀性。 本計畫將會集中研究在建立可解讀的模糊邏輯規則,以增進預測和分 析的DNA 結合區域分類的知識。本研究計畫過程分為五個階段:1)利用機 械學習方法來區別有結合及不會結合的區域中,所具有富含物化特性的資 訊。2)提出一個運用物化特性為特徵的演化式模糊規則分類器,此分類器 會收集可解讀的模糊規則。3)建立一套可解讀的模糊規則,用以作為預測 及分析DNA 的區域的知識庫。4)設計一個利用知識庫所建立的模糊規則統 整式分類器。5)確認DNA 結合區域知識庫,確認方法如下:a)分析DNA 結 合蛋白的結構,b)分析已知的DNA 結合位置,以及c)由實驗文獻探究已知 的會影響DNA 結合的物化特性。 為驗證本計畫所提出的方法,我們已得到的正面初步結果有:1)將會 得到一組物化特性的組合,2)建立一套演化式模糊規則分類器的原型,以 及3)一組簡潔又帶有知識且具有高度預測性的模糊規則。而本計劃最重要 且最具有貢獻的任務就是驗證所提出的知識規則庫,並且提供一套用以預 測及分析DNA 結合區域/蛋白質的系統。這套系統也同時可以利用可解讀的 模糊規則以解釋預測的結果,並同時輸出決策規則。
DNA-binding domains/proteins are functional proteins in a cell, which plays a vital role in various essential biological activities, such as DNA transcription. Recently, we published an accurate support vector machine (SVM) based method for predicting DNA-binding domains which is an important topic in bioinformatics researches. However, most of existing methods used SVM with many features of real values as a classifier which are good at prediction but not at human interpretability. This project aims to establish an interpretable fuzzy-rule knowledge base for predicting and analyzing DNA-binding domains. The research procedure consists of five stages as follows. 1) Identify informative physicochemical properties of sequences by way of distinguishing binding and non-binding domains using a machine learning approach. 2) Propose an evolutionary fuzzy-rule classifier to collect interpretable fuzzy rules based on the identified physicochemical properties. 3) Establish an interpretable fuzzy-rule knowledge base for predicting and analyzing DNA-binding domains. 4) Design an ensemble classifier for prediction using the fuzzy rules of the knowledge base. 5) Validate the DNA-binding knowledge by a) analyzing structures of DNA-binding proteins, b) analyzing the known DNA-binding sites, and c) investigating known physicochemical properties of affecting the DNA binding by experiments from literature. For validating the proposed approach, we have some positive preliminary results: 1) a set of potential physicochemical properties, 2) a prototype of the evolutionary fuzzy-rule classifier, and 3) a feasible, compact and accurate set of fuzzy rules. The most important task is to carefully validate the knowledge base and provide a system for predicting and analyzing DNA-binding domains/proteins. The system can output the decision rules using the interpretable fuzzy rules to explain the prediction results.
官方說明文件#: NSC100-2221-E009-130
URI: http://hdl.handle.net/11536/99221
https://www.grb.gov.tw/search/planDetail?id=2342550&docId=369355
顯示於類別:研究計畫


文件中的檔案:

  1. 1002221E009130.PDF

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。