A simple strategy to enhance the speed of protein secondary structure prediction without sacrificing accuracy

doi:10.1371/journal.pone.0235153

Full metadata record

DC Field	Value	Language
dc.contributor.author	Juan, Sheng-Hung	en_US
dc.contributor.author	Chen, Teng-Ruei	en_US
dc.contributor.author	Lo, Wei-Cheng	en_US
dc.date.accessioned	2020-10-05T01:59:44Z	-
dc.date.available	2020-10-05T01:59:44Z	-
dc.date.issued	2020-06-30	en_US
dc.identifier.issn	1932-6203	en_US
dc.identifier.uri	http://dx.doi.org/10.1371/journal.pone.0235153	en_US
dc.identifier.uri	http://hdl.handle.net/11536/154866	-
dc.description.abstract	The secondary structure prediction of proteins is a classic topic of computational structural biology with a variety of applications. During the past decade, the accuracy of prediction achieved by state-of-the-art algorithms has been >80%; meanwhile, the time cost of prediction increased rapidly because of the exponential growth of fundamental protein sequence data. Based on literature studies and preliminary observations on the relationships between the size/homology of the fundamental protein dataset and the speed/accuracy of predictions, we raised two hypotheses that might be helpful to determine the main influence factors of the efficiency of secondary structure prediction. Experimental results of size and homology reductions of the fundamental protein dataset supported those hypotheses. They revealed that shrinking the size of the dataset could substantially cut down the time cost of prediction with a slight decrease of accuracy, which could be increased on the contrary by homology reduction of the dataset. Moreover, the Shannon information entropy could be applied to explain how accuracy was influenced by the size and homology of the dataset. Based on these findings, we proposed that a proper combination of size and homology reductions of the protein dataset could speed up the secondary structure prediction while preserving the high accuracy of state-of-the-art algorithms. Testing the proposed strategy with the fundamental protein dataset of the year 2018 provided by the Universal Protein Resource, the speed of prediction was enhanced over 20 folds while all accuracy measures remained equivalently high. These findings are supposed helpful for improving the efficiency of researches and applications depending on the secondary structure prediction of proteins. To make future implementations of the proposed strategy easy, we have established a database of size and homology reduced protein datasets at http://10.life.nctu.etu.tw/UniRefNR.	en_US
dc.language.iso	en_US	en_US
dc.title	A simple strategy to enhance the speed of protein secondary structure prediction without sacrificing accuracy	en_US
dc.type	Article	en_US
dc.identifier.doi	10.1371/journal.pone.0235153	en_US
dc.identifier.journal	PLOS ONE	en_US
dc.citation.volume	15	en_US
dc.citation.issue	6	en_US
dc.citation.spage	0	en_US
dc.citation.epage	0	en_US
dc.contributor.department	生物科技學系	zh_TW
dc.contributor.department	生物資訊及系統生物研究所	zh_TW
dc.contributor.department	生物資訊研究中心	zh_TW
dc.contributor.department	Department of Biological Science and Technology	en_US
dc.contributor.department	Institude of Bioinformatics and Systems Biology	en_US
dc.contributor.department	Center for Bioinformatics Research	en_US
dc.identifier.wosnumber	WOS:000546956600046	en_US
dc.citation.woscount	0	en_US
Appears in Collections:	Articles