Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition

doi:10.1186/1471-2105-13-S17-S3

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	Huang, Hui-Ling	en_US
dc.contributor.author	Charoenkwan, Phasit	en_US
dc.contributor.author	Kao, Te-Fen	en_US
dc.contributor.author	Lee, Hua-Chin	en_US
dc.contributor.author	Chang, Fang-Lin	en_US
dc.contributor.author	Huang, Wen-Lin	en_US
dc.contributor.author	Ho, Shinn-Jang	en_US
dc.contributor.author	Shu, Li-Sun	en_US
dc.contributor.author	Chen, Wen-Liang	en_US
dc.contributor.author	Ho, Shinn-Ying	en_US
dc.date.accessioned	2014-12-08T15:28:48Z	-
dc.date.available	2014-12-08T15:28:48Z	-
dc.date.issued	2012-12-13	en_US
dc.identifier.issn	1471-2105	en_US
dc.identifier.uri	http://dx.doi.org/10.1186/1471-2105-13-S17-S3	en_US
dc.identifier.uri	http://hdl.handle.net/11536/20829	-
dc.description.abstract	Background: Existing methods for predicting protein solubility on overexpression in Escherichia coli advance performance by using ensemble classifiers such as two-stage support vector machine (SVM) based classifiers and a number of feature types such as physicochemical properties, amino acid and dipeptide composition, accompanied with feature selection. It is desirable to develop a simple and easily interpretable method for predicting protein solubility, compared to existing complex SVM-based methods. Results: This study proposes a novel scoring card method (SCM) by using dipeptide composition only to estimate solubility scores of sequences for predicting protein solubility. SCM calculates the propensities of 400 individual dipeptides to be soluble using statistic discrimination between soluble and insoluble proteins of a training data set. Consequently, the propensity scores of all dipeptides are further optimized using an intelligent genetic algorithm. The solubility score of a sequence is determined by the weighted sum of all propensity scores and dipeptide composition. To evaluate SCM by performance comparisons, four data sets with different sizes and variation degrees of experimental conditions were used. The results show that the simple method SCM with interpretable propensities of dipeptides has promising performance, compared with existing SVM-based ensemble methods with a number of feature types. Furthermore, the propensities of dipeptides and solubility scores of sequences can provide insights to protein solubility. For example, the analysis of dipeptide scores shows high propensity of a-helix structure and thermophilic proteins to be soluble. Conclusions: The propensities of individual dipeptides to be soluble are varied for proteins under altered experimental conditions. For accurately predicting protein solubility using SCM, it is better to customize the score card of dipeptide propensities by using a training data set under the same specified experimental conditions. The proposed method SCM with solubility scores and dipeptide propensities can be easily applied to the protein function prediction problems that dipeptide composition features play an important role. Availability: The used datasets, source codes of SCM, and supplementary files are available at http://iclab.life.nctu.edu.tw/SCM/.	en_US
dc.language.iso	en_US	en_US
dc.title	Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition	en_US
dc.type	Article; Proceedings Paper	en_US
dc.identifier.doi	10.1186/1471-2105-13-S17-S3	en_US
dc.identifier.journal	BMC BIOINFORMATICS	en_US
dc.citation.volume	13	en_US
dc.citation.issue		en_US
dc.citation.epage		en_US
dc.contributor.department	生物科技學系	zh_TW
dc.contributor.department	生物資訊及系統生物研究所	zh_TW
dc.contributor.department	Department of Biological Science and Technology	en_US
dc.contributor.department	Institude of Bioinformatics and Systems Biology	en_US
dc.identifier.wosnumber	WOS:000312985100003	-
顯示於類別：	會議論文

文件中的檔案：

000312985100003.pdf

若為 zip 檔案，請下載檔案解壓縮後，用瀏覽器開啟資料夾中的 index.html 瀏覽全文。