One of the main goals in bioinformatics is the development of tools for protein functional annotation. To infer the function of a protein we can use fundamentally two different approaches: one based on the comparison of protein sequences (Altschul et al., 1997; Bairoch, 1991; Pearson, 1990) and another based on the comparison of protein structures (Ausiello et al., 2007; Pearl et al., 2001; Thornton et al., 2000; Whisstock and Lesk, 2003). The three-dimensional structure is more informative than the sole amino acidic sequence to assign a molecular function to a new protein (Watson et al., 2007). For this reason many automated methods have been developed to infer the function of a protein of known structure using comparative approaches. For a review see (Gherardini and Helmer-Citterich, 2008). Because of the importance of automatic methods for structure-based protein annotation for biologists and researchers, these methods have to become user-friendly and easily usable even in those laboratories where there is no bioinformatician. The evolutionary information of the protein families (Punta et al., 2012) can be used to improve the performances of structure-based methods. It is well known that the more a residue is conserved during the evolution of homologous proteins, the more important this residue is for the preservation of those proteins function. The amino acids in enzymatic catalytic sites, for example, remain unchanged during the evolution even if the proteins have a very low sequence identity. The large-scale sequencing projects (Abecasis et al., 2010; Sawicki et al., 1993) have increased the growth of protein sequences databases rather than the growth of the experimentally solved protein structure. There are indeed more than 20 million entries in the databases of protein sequences (UniProtConsortium, 2013) and only 79 thousand entries in the database of protein structures (Rose et al., 2013). In the absence of an experimentally determined structure we can use different bioinformatics tools for the prediction of protein structure. If we find one or more protein/s of known structure sharing more than 30% sequence identity with our protein, we can use the homology modeling technique to transfer the structural information of the structure from a template to our target protein (Bork et al., 1994). Obviously, because we have a transfer of the structure information, the quality of a model depends on the sequence identity with the template (Chothia and Lesk, 1986). The aims in our work are: - To explore the possibility of using sequence conservation as a parameter to improve the performances of structure-based functional prediction methods. - To develop webserver to a fully and easily access to bioinformatics structure-based functional prediction methods. - To analyse whether and within which limits structure-based methods can be applied to protein models and to analyse how the prediction performances of different structure-based functional annotation methods decrease with the overall decrease of models’ quality. We developed a new method to determine how much the residues are evolutionary conserved in a protein structure. The method is called PFAMer and derives the conservation scores from the PFAM multiple alignment of protein sequences. This procedure has been successfully applied to PDBinder and Pfinder two structure-based methods (developed in our laboratory) that are able to identify binding sites on protein structures. Pfinder identifies phosphate-binding site and PDBinder identifies binding pockets independently on the specific ligand they are able to bind. The introduction of this parameter to Pfinder allows us to reduce the number of false positive (FP) predictions improving the performances by 3%. The application of PFAMer to PDBinder improves the performances by 5%. In order to make the structure-based method, developed in our laboratory, more accessible to the scientific community we developed two webservers called Phosfinder and webPDBinder based on the improved version of Pfinder and PDBinder. In the last part of this work we analyzed the degradation in performances of the structure annotation methods when they work on models instead of xray solved structures. To achieve this goal, we developed an automated procedure to compare the performances of different structure-based functional prediction methods when used on a set of homology models of different quality or on an experimentally solved structure. Each method is tested on the same dataset of proteins proposed by the authors in the method original publication and on a set of homology models built for each structure in the dataset. To obtain models of different quality only templates are used having a sequence similarity with the solved structures under a set of fixed thresholds. We selected different methods for each category of functional annotation. The performances of the tested methods have been measured using the Fscore or the Matthews correlation coefficient (MCC) where applicable. The applicability of the functional prediction methods to protein models has never been explored so far, even if most of the structural information now available is stored in 3D models. Sensitivity to model quality should become a parameter of evaluation when comparing structure-based methods for functional annotation and give precious information about their applicability to real-world cases. The analysis of the features of the different methods can give hints about the reasons that determine the sensitiveness to model quality.

(2012). Structure-based functional annotation methods: development and assessment on homology models.

Structure-based functional annotation methods: development and assessment on homology models

MANGONE, IOLANDA
2012-01-01

Abstract

One of the main goals in bioinformatics is the development of tools for protein functional annotation. To infer the function of a protein we can use fundamentally two different approaches: one based on the comparison of protein sequences (Altschul et al., 1997; Bairoch, 1991; Pearson, 1990) and another based on the comparison of protein structures (Ausiello et al., 2007; Pearl et al., 2001; Thornton et al., 2000; Whisstock and Lesk, 2003). The three-dimensional structure is more informative than the sole amino acidic sequence to assign a molecular function to a new protein (Watson et al., 2007). For this reason many automated methods have been developed to infer the function of a protein of known structure using comparative approaches. For a review see (Gherardini and Helmer-Citterich, 2008). Because of the importance of automatic methods for structure-based protein annotation for biologists and researchers, these methods have to become user-friendly and easily usable even in those laboratories where there is no bioinformatician. The evolutionary information of the protein families (Punta et al., 2012) can be used to improve the performances of structure-based methods. It is well known that the more a residue is conserved during the evolution of homologous proteins, the more important this residue is for the preservation of those proteins function. The amino acids in enzymatic catalytic sites, for example, remain unchanged during the evolution even if the proteins have a very low sequence identity. The large-scale sequencing projects (Abecasis et al., 2010; Sawicki et al., 1993) have increased the growth of protein sequences databases rather than the growth of the experimentally solved protein structure. There are indeed more than 20 million entries in the databases of protein sequences (UniProtConsortium, 2013) and only 79 thousand entries in the database of protein structures (Rose et al., 2013). In the absence of an experimentally determined structure we can use different bioinformatics tools for the prediction of protein structure. If we find one or more protein/s of known structure sharing more than 30% sequence identity with our protein, we can use the homology modeling technique to transfer the structural information of the structure from a template to our target protein (Bork et al., 1994). Obviously, because we have a transfer of the structure information, the quality of a model depends on the sequence identity with the template (Chothia and Lesk, 1986). The aims in our work are: - To explore the possibility of using sequence conservation as a parameter to improve the performances of structure-based functional prediction methods. - To develop webserver to a fully and easily access to bioinformatics structure-based functional prediction methods. - To analyse whether and within which limits structure-based methods can be applied to protein models and to analyse how the prediction performances of different structure-based functional annotation methods decrease with the overall decrease of models’ quality. We developed a new method to determine how much the residues are evolutionary conserved in a protein structure. The method is called PFAMer and derives the conservation scores from the PFAM multiple alignment of protein sequences. This procedure has been successfully applied to PDBinder and Pfinder two structure-based methods (developed in our laboratory) that are able to identify binding sites on protein structures. Pfinder identifies phosphate-binding site and PDBinder identifies binding pockets independently on the specific ligand they are able to bind. The introduction of this parameter to Pfinder allows us to reduce the number of false positive (FP) predictions improving the performances by 3%. The application of PFAMer to PDBinder improves the performances by 5%. In order to make the structure-based method, developed in our laboratory, more accessible to the scientific community we developed two webservers called Phosfinder and webPDBinder based on the improved version of Pfinder and PDBinder. In the last part of this work we analyzed the degradation in performances of the structure annotation methods when they work on models instead of xray solved structures. To achieve this goal, we developed an automated procedure to compare the performances of different structure-based functional prediction methods when used on a set of homology models of different quality or on an experimentally solved structure. Each method is tested on the same dataset of proteins proposed by the authors in the method original publication and on a set of homology models built for each structure in the dataset. To obtain models of different quality only templates are used having a sequence similarity with the solved structures under a set of fixed thresholds. We selected different methods for each category of functional annotation. The performances of the tested methods have been measured using the Fscore or the Matthews correlation coefficient (MCC) where applicable. The applicability of the functional prediction methods to protein models has never been explored so far, even if most of the structural information now available is stored in 3D models. Sensitivity to model quality should become a parameter of evaluation when comparing structure-based methods for functional annotation and give precious information about their applicability to real-world cases. The analysis of the features of the different methods can give hints about the reasons that determine the sensitiveness to model quality.
2012
2012/2013
Biologia cellulare e molecolare
26.
Settore BIO/11 - BIOLOGIA MOLECOLARE
Settore BIO/10 - BIOCHIMICA
English
Tesi di dottorato
(2012). Structure-based functional annotation methods: development and assessment on homology models.
File in questo prodotto:
File Dimensione Formato  
Iolanda Mangone 2013.pdf

solo utenti autorizzati

Licenza: Non specificato
Dimensione 9.56 MB
Formato Adobe PDF
9.56 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/202203
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact