Consensus scoring criteria for improving enrichment in virtual screening

doi:10.1021/ci050034w

Full metadata record

DC Field	Value	Language
dc.contributor.author	Yang, JM	en_US
dc.contributor.author	Chen, YF	en_US
dc.contributor.author	Shen, TW	en_US
dc.contributor.author	Kristal, BS	en_US
dc.contributor.author	Hsu, DF	en_US
dc.date.accessioned	2014-12-08T15:18:48Z	-
dc.date.available	2014-12-08T15:18:48Z	-
dc.date.issued	2005-07-01	en_US
dc.identifier.issn	1549-9596	en_US
dc.identifier.uri	http://dx.doi.org/10.1021/ci050034w	en_US
dc.identifier.uri	http://hdl.handle.net/11536/13517	-
dc.description.abstract	Virtual screening of molecular compound libraries is a potentially powerful and inexpensive method for the discovery of novel lead compounds for drug development. The major weakness of virtual screening-the inability to consistently identify true positives (leads)-is likely due to our incomplete understanding of the chemistry involved in ligand binding and the subsequently imprecise scoring algorithms. It has been demonstrated that combining multiple scoring functions (consensus scoring) improves the enrichment of true positives. Previous efforts at consensus scoring have largely focused on empirical results, but they have yet to provide a theoretical analysis that gives insight into real features of combinations and data fusion for virtual screening. Results: We demonstrate that combining multiple scoring functions improves the enrichment of true positives only if (a) each of the individual scoring functions has relatively high performance and (b) the individual scoring functions are distinctive. Notably, these two prediction variables are previously established criteria for the performance of data fusion approaches using either rank or score combinations. This work, thus, establishes a potential theoretical basis for the probable success of data fusion approaches to improve yields in in silico screening experiments. Furthermore, it is similarly established that the second criterion (b) can, in at least some cases, be functionally defined as the area between the rank versus score plots generated by the two (or more) algorithms. Because rank-score plots are independent of the performance of the individual scoring function, this establishes a second theoretically defined approach to determining the likely success of combining data from different predictive algorithms. This approach is, thus, useful in practical settings in the virtual screening process when the performance of at least two individual scoring functions (such as in criterion a) can be estimated as having a high likelihood of having high performance, even if no training sets are available. We provide initial validation of this theoretical approach using data from five scoring systems with two evolutionary docking algorithms on four targets, thymidine kinase, human dihydrofolate reductase, and estrogen receptors of antagonists and agonists. Our procedure is computationally efficient, able to adapt to different situations, and scalable to a large number of compounds as well as to a greater number of combinations. Results of the experiment show a fairly significant improvement (vs single algorithms) in several measures of scoring quality, specifically "goodness-of-hit" scores, false positive rates, and "enrichment". This approach (available online at http://gemdock.life. nctu.edu.tw/dock/download.php) has practical utility for cases where the basic tools are known or believed to be generally applicable, but where specific training sets are absent.	en_US
dc.language.iso	en_US	en_US
dc.title	Consensus scoring criteria for improving enrichment in virtual screening	en_US
dc.type	Article	en_US
dc.identifier.doi	10.1021/ci050034w	en_US
dc.identifier.journal	JOURNAL OF CHEMICAL INFORMATION AND MODELING	en_US
dc.citation.volume	45	en_US
dc.citation.issue	4	en_US
dc.citation.spage	1134	en_US
dc.citation.epage	1146	en_US
dc.contributor.department	生物科技學系	zh_TW
dc.contributor.department	生物資訊及系統生物研究所	zh_TW
dc.contributor.department	Department of Biological Science and Technology	en_US
dc.contributor.department	Institude of Bioinformatics and Systems Biology	en_US
dc.identifier.wosnumber	WOS:000230864300035	-
dc.citation.woscount	112	-
Appears in Collections:	Articles

Files in This Item:

000230864300035.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.