完整後設資料紀錄
DC 欄位語言
dc.contributor.authorLin, Hsin-Nanen_US
dc.contributor.authorSung, Ting-Yien_US
dc.contributor.authorHo, Shinn-Yingen_US
dc.contributor.authorHsu, Wen-Lianen_US
dc.date.accessioned2014-12-08T15:05:49Z-
dc.date.available2014-12-08T15:05:49Z-
dc.date.issued2010-12-02en_US
dc.identifier.issn1471-2164en_US
dc.identifier.urihttp://dx.doi.org/10.1186/1471-2164-11-S4-S4en_US
dc.identifier.urihttp://hdl.handle.net/11536/4351-
dc.description.abstractBackground: When characterizing the structural topology of proteins, protein secondary structure (PSS) plays an important role in analyzing and modeling protein structures because it represents the local conformation of amino acids into regular structures. Although PSS prediction has been studied for decades, the prediction accuracy reaches a bottleneck at around 80%, and further improvement is very difficult. Results: In this paper, we present an improved dictionary-based PSS prediction method called SymPred, and a meta-predictor called SymPsiPred. We adopt the concept behind natural language processing techniques and propose synonymous words to capture local sequence similarities in a group of similar proteins. A synonymous word is an n-gram pattern of amino acids that reflects the sequence variation in a protein's evolution. We generate a protein-dependent synonymous dictionary from a set of protein sequences for PSS prediction. On a large non-redundant dataset of 8,297 protein chains (DsspNr-25), the average Q(3) of SymPred and SymPsiPred are 81.0% and 83.9% respectively. On the two latest independent test sets (EVA Set_1 and EVA_Set2), the average Q(3) of SymPred is 78.8% and 79.2% respectively. SymPred outperforms other existing methods by 1.4% to 5.4%. We study two factors that may affect the performance of SymPred and find that it is very sensitive to the number of proteins of both known and unknown structures. This finding implies that SymPred and SymPsiPred have the potential to achieve higher accuracy as the number of protein sequences in the NCBInr and PDB databases increases. Conclusions: Our experiment results show that local similarities in protein sequences typically exhibit conserved structures, which can be used to improve the accuracy of secondary structure prediction. For the application of synonymous words, we demonstrate an example of a sequence alignment which is generated by the distribution of shared synonymous words of a pair of protein sequences. We can align the two sequences nearly perfectly which are very dissimilar at the sequence level but very similar at the structural level. The SymPred and SymPsiPred prediction servers are available at http://bio-cluster.iis.sinica.edu.tw/SymPred/.en_US
dc.language.isoen_USen_US
dc.titleImproving protein secondary structure prediction based on short subsequences with local structure similarityen_US
dc.typeArticle; Proceedings Paperen_US
dc.identifier.doi10.1186/1471-2164-11-S4-S4en_US
dc.identifier.journalBMC GENOMICSen_US
dc.citation.volume11en_US
dc.citation.issueen_US
dc.citation.epageen_US
dc.contributor.department生物資訊及系統生物研究所zh_TW
dc.contributor.departmentInstitude of Bioinformatics and Systems Biologyen_US
dc.identifier.wosnumberWOS:000289200700004-
顯示於類別:會議論文


文件中的檔案:

  1. 000289200700004.pdf

若為 zip 檔案,請下載檔案解壓縮後,用瀏覽器開啟資料夾中的 index.html 瀏覽全文。