MOTIVATION: Unravelling the rules underlying protein-protein and protein-ligand interactions is a crucial step in understanding cell machinery. Peptide recognition modules (PRMs) are globular protein domains which focus their binding targets on short protein sequences and play a key role in the frame of protein-protein interactions. High-throughput techniques permit the whole proteome scanning of each domain, but they are characterized by a high incidence of false positives. In this context, there is a pressing need for the development of in silico experiments to validate experimental results and of computational tools for the inference of domain-peptide interactions. RESULTS: We focused on the SH3 domain family and developed a machine-learning approach for inferring interaction specificity. SH3 domains are well-studied PRMs which typically bind proline-rich short sequences characterized by the PxxP consensus. The binding information is known to be held in the conformation of the domain surface and in the short sequence of the peptide. Our method relies on interaction data from high-throughput techniques and benefits from the integration of sequence and structure data of the interacting partners. Here, we propose a novel encoding technique aimed at representing binding information on the basis of the domain-peptide contact residues in complexes of known structure. Remarkably, the new encoding requires few variables to represent an interaction, thus avoiding the 'curse of dimension'. Our results display an accuracy >90% in detecting new binders of known SH3 domains, thus outperforming neural models on standard binary encodings, profile methods and recent statistical predictors. The method, moreover, shows a generalization capability, inferring specificity of unknown SH3 domains displaying some degree of similarity with the known data.

Ferraro, E., Via, A., Ausiello, G., HELMER CITTERICH, M. (2006). A novel structure-based encoding for machine-learning applied to the inference of SH3 domain specificity. BIOINFORMATICS, 22(19), 2333-2339 [10.1093/bioinformatics/btl403].

A novel structure-based encoding for machine-learning applied to the inference of SH3 domain specificity

AUSIELLO, GABRIELE;HELMER CITTERICH, MANUELA
2006-10-01

Abstract

MOTIVATION: Unravelling the rules underlying protein-protein and protein-ligand interactions is a crucial step in understanding cell machinery. Peptide recognition modules (PRMs) are globular protein domains which focus their binding targets on short protein sequences and play a key role in the frame of protein-protein interactions. High-throughput techniques permit the whole proteome scanning of each domain, but they are characterized by a high incidence of false positives. In this context, there is a pressing need for the development of in silico experiments to validate experimental results and of computational tools for the inference of domain-peptide interactions. RESULTS: We focused on the SH3 domain family and developed a machine-learning approach for inferring interaction specificity. SH3 domains are well-studied PRMs which typically bind proline-rich short sequences characterized by the PxxP consensus. The binding information is known to be held in the conformation of the domain surface and in the short sequence of the peptide. Our method relies on interaction data from high-throughput techniques and benefits from the integration of sequence and structure data of the interacting partners. Here, we propose a novel encoding technique aimed at representing binding information on the basis of the domain-peptide contact residues in complexes of known structure. Remarkably, the new encoding requires few variables to represent an interaction, thus avoiding the 'curse of dimension'. Our results display an accuracy >90% in detecting new binders of known SH3 domains, thus outperforming neural models on standard binary encodings, profile methods and recent statistical predictors. The method, moreover, shows a generalization capability, inferring specificity of unknown SH3 domains displaying some degree of similarity with the known data.
1-ott-2006
Pubblicato
Rilevanza internazionale
Articolo
Sì, ma tipo non specificato
Settore BIO/11 - BIOLOGIA MOLECOLARE
English
Con Impact Factor ISI
Protein Structure, Tertiary; Structure-Activity Relationship; src Homology Domains; Sequence Analysis, Protein; Binding Sites; Models, Molecular; Computer Simulation; Protein Interaction Mapping; Proteome; Models, Chemical; Artificial Intelligence; Pattern Recognition, Automated; Algorithms; Protein Binding
http://bioinformatics.oxfordjournals.org/content/22/19/2333.long
Ferraro, E., Via, A., Ausiello, G., HELMER CITTERICH, M. (2006). A novel structure-based encoding for machine-learning applied to the inference of SH3 domain specificity. BIOINFORMATICS, 22(19), 2333-2339 [10.1093/bioinformatics/btl403].
Ferraro, E; Via, A; Ausiello, G; HELMER CITTERICH, M
Articolo su rivista
File in questo prodotto:
File Dimensione Formato  
Ferraro_etal_2006.pdf

accesso aperto

Descrizione: research article
Dimensione 246.44 kB
Formato Adobe PDF
246.44 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/15338
Citazioni
  • ???jsp.display-item.citation.pmc??? 12
  • Scopus 21
  • ???jsp.display-item.citation.isi??? 21
social impact