Journal of Molecular Biology
Regular articleA fast method to predict protein interaction sites from sequences1
Introduction
Protein interaction sites are critical domains for selective recognition of molecules and for the formation of complexes. They are responsible for diverse important biological functions. Therefore, detection of interaction domains in sequences could help in identifying protein function. It could also help, for example, to validate functional hypotheses via the design of restricted fragments for two-hybrid assays (Vidal et al., 1996) or of specific mutageneses (Phizicky & Fields, 1995). Computational methods are of great interest in predicting protein interacting pairs, and thus to construct metabolic pathways or signalling cascades for recently sequenced genomes. The prediction of interaction sites should be a good starting point to help identify pharmacological targets and help drug design studies. Such analyses require the elaboration of docking procedures Janin 1995, Shoichet and Kuntz 1996, Sternberg et al 1998, the knowledge of protein and ligand structures (Bamborough & Cohen, 1996) and the consideration of conformational changes (Betts & Sternberg, 1999).
Several methods exist for predicting protein structure; they identify interaction domains by analysing the hydrophobicity, solvation, protrusion and the accessibility of residues Young et al 1994, Jones and Thornton 1997a, Jones and Thornton 1997b. Those approaches are interesting but cannot answer requests of the great number of biochemists with sequences, but no structural data. Indeed, despite the amount of protein structures already solved, the bank of structures is ridiculously small as compared to those of sequences.
To our knowledge, very few methods use sequences as their starting point. The algorithm by Kini & Evans (1995) supports that proline residues frequently occur near interaction sites. The frequency is 2.5 times higher than expected by random distribution. They suggest that “proline-brackets” encircle a large number of protein-protein interaction sites (Kini & Evans, 1996). Another method uses multiple sequence alignments and focuses on correlated mutations to detect protein interacting sites (Pazos et al., 1997). The hypothesis is that residues close to protein-protein interaction sites tend to mutate simultaneously during evolution. Therefore, from multiple sequence alignments, the authors detect the residues linking different protein domains and interacting in heterodimer complexes.
In a recent analysis, Marcotte et al. (1999) report that they can predict which proteins interact by analysing genome sequences. The hypothesis is that two proteins are interacting if, in another living organism, they are assembled as a single protein. The procedure is also very powerful to predict the functions of wide protein complexes if one can trace domain homologies. However, the procedure gives no information on the interacting amino acids per se.
Here, we test a fast and simple method to predict stretches of protein interaction sites from sequences in the absence of any structural report. Eisenberg et al. (1982) previously showed that plotting the mean alpha-helical hydrophobic moment 〈μH〉 versus the mean hydrophobicity 〈H〉 allows us to classify protein fragments according to their location in the structures; either they are membrane segments, parts of globular domains or surface-seeking helices. The authors demonstrated that a high level of hydrophobicity, together with a low hydrophobic moment, support that the fragment is membranous, whereas residues from surface-seeking helices cover a wide diagonal area beginning at the upper left of the plot (Figure 1). The diagram was thus divided into four regions corresponding to globular, surface and membrane (monomeric and multimeric) domains, called G, S and M, respectively (Eisenberg et al., 1984; see Figure 1(a)). Here, a fifth domain, the “receptor-binding domain” (RBD) is investigated in which we detect some residues of protein interaction sites. The RBD method is described and is applied to different sequence databases. Results show that the plot drawn from the Eisenberg’s method detects most of the experimentally known interaction sites. The effects of several parameters of the procedure were tested. The structures, the accessibility and the functional characterisation of predicted sites were also investigated on few 3D structures. The results obtained with the DNA-binding and the calcium-binding sequences and with the 3D structures, such as the ultrabithorax-extradenticle-DNA complex and the calcium-binding protein, demonstrate that our procedure can detect various types of interaction sites as long as they involve hydrophilic residues. Finally, we demonstrate that the RBD analysis could be valuable in identifying mutations. Two examples, in the Mason-Pfizer monkey virus Gag protein and in a penicillin-binding protein, are shown.
Section snippets
Apolipoprotein E and Newcastle disease virus fusion protein analysis
In the analysis of the apolipoprotein E sequence, De Loof et al. (1986) extended the concept previously suggested by Eisenberg by considering an additional region of the hydrophobicity/hydrophobic moment plot that they called “receptor-binding-domain” (RBD). The RBD method is thus based on the calculation of the mean hydrophobic moment 〈μH〉 and the mean hydrophobicity 〈H〉 of an N-residue window (N being odd) centred at the amino acid of interest. The δ angle is 100° to correspond to the
Acknowledgements
The authors wish to thank J.M. Ghuysen and M. Nguyen-Distèche for their contribution and discussion during the analysis of PBP3. We are also grateful to A. Burny, F. Bex and S. Arnould for their constructive discussion about the M-PMV Gag protein and to M.R. Conte for kindly providing the atomic coordinates of the structure. We acknowledge R.M. Kini for the access to its database of known interaction sites. X.G. is supported by the Interuniversity Poles of Attraction Programme-Belgian State,
References (35)
- et al.
Modeling protein-ligand complexes
Curr. Opin. Struct. Biol.
(1996) Differentiation of lipid-associating helices by use of three-dimensional molecular hydrophobicity potential calculations
J. Biol. Chem.
(1991)- et al.
Binding of a high reactive heparin to human apolipoprotein Eidentification of two heparin-binding domains
Biochem. Biophys. Res. Commun.
(1986) - et al.
Analysis of membrane and surface protein sequences with the hydrophobic moment plot
J. Mol. Biol.
(1984) Protein-protein recognition
Prog. Biophys. Mol. Biol.
(1995)- et al.
Analysis of protein-protein interaction sites using surface patches
J. Mol. Biol.
(1997) - et al.
Prediction of protein-protein interaction sites using surface patches
J. Mol. Biol.
(1997) - et al.
A hypothetical structural role for proline residues in the flanking segments of protein-protein interaction sites
Biochem. Biophys. Res. Commun.
(1995) - et al.
Prediction of potential protein-protein interaction sites from amino acid sequence. Identification of a fibrin polymerization site
FEBS Letters
(1996) - et al.
Plasma lipoproteinsapolipoprotein structure and function
J. Lipid Res.
(1984)
Correlated mutations contain information about protein-protein interaction
J. Mol. Biol.
WinMGMa fast CPK molecular graphics program for analyzing molecular structure
J. Mol. Graphics
A single amino acid substitution within the matrix protein of a type D retrovirus converts its morphogenesis to that of a type C retrovirus
Cell
Comparative protein modelling by saytisfaction spatial restraints
J. Mol. Biol.
Predicting the structure of protein complexesa step in the right direction
Chem. Biol.
Predictive docking of protein-protein and protein-DNA complexes
Curr. Opin. Struct. Biol.
Human apolipoprotein E. Determination of the heparin binding sites of apolipoprotein E3
J. Biol. Chem.
Cited by (180)
Concepts and Experimental Protocols of Modelling and Informatics in Drug Design
2020, Concepts and Experimental Protocols of Modelling and Informatics in Drug DesignProtein-Protein Interaction Site Prediction Based on Attention Mechanism and Convolutional Neural Networks
2023, IEEE/ACM Transactions on Computational Biology and BioinformaticsUsing evolutionary data to make sense of macromolecules with a “face-lifted” ConSurf
2023, Protein Science
- 1
Edited by B. Holland