完整後設資料紀錄
DC 欄位語言
dc.contributor.author游富傑en_US
dc.contributor.authorFu-Chieh Yuen_US
dc.contributor.author何信瑩en_US
dc.contributor.authorShinn-Ying Hoen_US
dc.date.accessioned2014-12-12T03:00:12Z-
dc.date.available2014-12-12T03:00:12Z-
dc.date.issued2005en_US
dc.identifier.urihttp://140.113.39.130/cdrfb3/record/nctu/#GT009351509en_US
dc.identifier.urihttp://hdl.handle.net/11536/79862-
dc.description.abstract在本研究中,我們針對蛋白質上去氧核醣核酸鍵結位置的預測問題設計一較精確之分類器,我們分別使用模糊化最近k個鄰居法與向量支持機器法兩種分類器來預測蛋白質上去氧核醣核酸鍵結位置。最後我們提出一效能較佳之方法,使用向量支持機器法結合蛋白質多重序列比對中位置加權矩陣提供的氨基酸序列演化資訊來預測蛋白質上去氧核醣核酸的鍵結位置。由於蛋白質中與去氧核醣核酸鍵結和非鍵結的氨基酸位置的數目比例顯著不均衡,所以除了向量支持機器原有的參數外額外兩個針對此一不平衡問題之參數將同時最佳化,希望最後能獲得最高之淨準確率(NP,鍵結類氨基酸準確率與非鍵結類氨基酸準確率的平均值)。為了評估所建立向量支持機器模型的普遍化能力,我們額外蒐集另一低序列相似度的蛋白質-去氧核醣核酸複合物結晶資料,PDC-59,總共包含59條蛋白質鏈作為獨立測試的樣本。向量支持機器採用六等分交叉驗證,在訓練資料PDNA-62的淨準確率為80.15%而獨立測試資料PDC-59的淨準確率為69.54%,分別比現有最佳方法類神經網路提高13.45%及16.53%。除了位置加權矩陣特徵外,三種與蛋白質-去氧核醣核酸交互作用有關的氨基酸物化性質:溶劑可接觸表面積、電子電荷、和親疏水性也額外作為輸入向量支持機器的特徵值。結果顯示,預測新發現蛋白質上去氧核醣核酸鍵結位置時向量支持機器結合位置加權矩陣有較佳之表現。zh_TW
dc.description.abstractIn our study, we investigate the design of accurate predictors for DNA-binding sites in proteins from amino acid sequences. Two classification methods, support vector machine (SVM) and fuzzy k-nearest neighbors (fuzzy k-NN), are used to predict of DNA-binding sites in proteins. As a result, we propose a hybrid method that has best performance using SVM in conjunction with evolutionary information of amino acid sequences in terms of their position specific scoring matrices (PSSMs) for prediction of DNA-binding sites. Considering the numbers of binding and non-binding residues in proteins are significantly unequal, two additional weights as well as SVM parameters are analyzed and adopted to maximize net prediction (NP, an average of Sensitivity and Specificity) accuracy. To evaluate the generalization ability of the proposed method SVM-PSSM, a DNA-binding dataset PDC-59 consisting of 59 protein chains with low sequence identity on each other is additionally established. The SVM-based method using the same six-fold cross-validation procedure and PSSM features has NP=80.15% for the training dataset PDNA-62 and NP=69.54% for the independent test dataset PDC-59, which are much better than the existing neural network based method by increasing the NP values for training and test accuracies up to 13.45% and 16.53%, respectively. Besides the PSSM feature, other amino acids physico-chemical properties features which are related to protein-DNA interactions such as solvent accessible surface area, electric charge, and hydropathy index are also adopted and analyzed. Simulation results reveal that SVM-PSSM performs well in predicting DNA-binding sites of novel proteins from amino acid sequences.en_US
dc.language.isoen_USen_US
dc.subject蛋白質上去氧核醣核酸鍵結位置zh_TW
dc.subject位置加權矩陣zh_TW
dc.subject向量支持機器zh_TW
dc.subject模糊化最近k個鄰居法zh_TW
dc.subjectDNA-binding proteinsen_US
dc.subjectPSSMen_US
dc.subjectSupport Vector Machineen_US
dc.subjectfuzzy k-NNen_US
dc.title預測蛋白質上去氧核醣核酸鍵結位置zh_TW
dc.titlePrediction of DNA-Binding Sites in Proteinsen_US
dc.typeThesisen_US
dc.contributor.department生物資訊及系統生物研究所zh_TW
顯示於類別:畢業論文


文件中的檔案:

  1. 150901.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。