标题: 利用蛋白质结构预测蛋白质内的重要功能位置
Prediction of functional sites of proteins from protein structures
作者: 于松桓
Sung-Huan Yu
黄镇刚
Jenn-Kang Hwang
生物资讯及系统生物研究所
关键字: 活化基;蛋白质接触点数;蛋白质中心;热扰动;金属键结残基;螯合物;Catalytic residues;protein contact number;protein centroid;thermal fluctuations;metal binding residues;chelate
公开日期: 2007
摘要: 第一章-活性位置(active site)
由于现在结构基因组学(structural genomics)的研究以惊人的速度发展,相当多的蛋白质结构已被解出并存放于蛋白质资料银行(Protein Data Bank - PDB)这个资料库中。因着前面说到的情形,逐渐出现许多不知道功能的蛋白质,而发展利用蛋白质结构直接预测蛋白质内活性位置的方法也变得日渐重要。有许多特性与蛋白质的活性位置有关联,例如:越密集的区域(higher packing density)、越靠近蛋白质几何中心(structural centrality)、热扰动(thermal fluctuations)越低的残基(residues),越有可能是活性位置,根据这些特性我们发展出一个简单的方法来预测蛋白质的活性位置。若我们给予这些方法所计算出来的结果一个合适的阙值(threshold),我们可以在760个非同源性酵素(nonhomologous enzyme)中预测到76%的活性位置,并且只有27%的假阳性(false positive)。倘若我们加入蛋白质序列(sequence)的资讯,用此资讯来加权原来的资料,可以预测到80%活性位置,只有20%的假阳性。我们的方法不需要序列或结构的比对(alignment),或利用结构模版库(structural template library),此方法也避免了繁杂的溶剂表面易溶性(solvent accessible surface)和分子力学(molecular mechanical)的计算。 我们相信我们的方法会是一个预测蛋白质活性位置相当有用的方法,并且比其他的方法还要完整。

第二章-金属离子键结位置(metal binding site)
金属离子在生物体中扮演相当重要的角色,例如:帮助酵素催化、调节生物体内机能、提高结构稳定性等。由于目前蛋白质结构快速增加的现代,预测蛋白质内金属离子的键结位置也就日趋重要。我们知道若是要让金属离子稳定的存在在蛋白质中,必须产生螯合物(chelate)。而要形成螯合物其中有一个因素非常重要,就是金属离子周围必须有足够的原子与它产生配位(coordinate)。这个特性非常类似我们第一章提到的依赖距离之接触点数(distance-dependent protein contact-number简称CN)的模型,即指明若有许多能够与金属离子反应的原子在一个残基的周围,此残基就极有可能是金属离子键结位置。一般来说,会与金属离子产生螯合物的原子为-氮(N)、硫(S)、氧(O)。根据这个想法,我们利用CN模型的想法,但是将C□换成像氮(N)、硫(S)、氧(O)的原子,用此方法来预测金属离子键结位置。此方法可以在Sodhi的资料组中正确预测72.4%钙离子、94.7%铜离子、86.5%铁离子、77.6%镁离子、88.5%锰离子和91.5%锌离子的键结位置。
Chapter 1 – active site
Due to the tremendous advances in structural genomics research, an incredible number of protein structures has been solved and deposited in PDB. As a result, the number of structures with unknown function also climbs up accordingly. It becomes increasingly important that one can predict functional sites directly from protein structures. Based on the distinct properties associated with the active-site residues such as higher packing density, proximity to structural centrality and smaller thermal fluctuations, we developed a simple method for detection of the active sites of enzymes to compute profiles based on the aforementioned properties. Using proper threshold values for the profiles, we are able to detect up to 76% of catalytic residues with 27% of false positives for a data set comprising 760 nonhomologous enzymes. If additional sequence information is included, the sequence-weighed profile method can be improved to detect 80% of catalytic residues with 20% of false positives. Our method does not require sequence or structural alignment, or a structural template library, and it avoids solvent accessible surface or molecular mechanical calculations. We believe that our method will be a useful tool for detection of possible active sites from protein structures to complement other existing methods.
Chapter 2 – metal binding site
Metal ions are crucial role in organisms. They participate in enzyme catalysis, play regulatory roles, and help maintain protein structure. In this era, there is incredible number of protein structures solved. So, the importance of predicting metal binding site is increased. We all know that if there are metal ions stable existed in protein, the metal ions should form chelate. One of the important factors to form chelate is there should be enough atoms to coordinate with metal ion. The characteristic is very similar as distance-dependent protein contact-number model (CN) that we introduced in chapter 1. This means that if there are more atoms that are high probability to interact with metal ion around the residue, that would be probably metal binding residue. In general, the atoms that have high probability to interact with metal are such as N, S, O. Base on the thought, we follow the aspect of CN but use the atoms, like N, S, O, to replaced C□ to predict metal binding residues. This method can detect Ca – 72.4%, Cu – 94.7%, Fe – 86.5%, Mg – 77.6%, Mn – 88.5%, and Zn – 91.5% in Sodhi’s dataset.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009551501
http://hdl.handle.net/11536/39429
显示于类别:Thesis


文件中的档案:

  1. 150101.pdf

If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.