David B. Dahl، نويسنده , , Zach Bohannan، نويسنده , , Qianxing Mo، نويسنده , , Marina Vannucci، نويسنده , , Jerry Tsai، نويسنده ,
Grouping the 20 residues is a classic strategy to discover ordered patterns and insights about the fundamental nature of proteins, their structure, and how they fold. Usually, this categorization is based on the biophysical and/or structural properties of a residueʹs side-chain group. We extend this approach to understand the effects of side chains on backbone conformation and to perform a knowledge-based classification of amino acids by comparing their backbone ϕ,ψ distributions in different types of secondary structure. At this finer, more specific resolution, torsion angle data are often sparse and discontinuous (especially for nonhelical classes) even though a comprehensive set of protein structures is used. To ensure the precision of Ramachandran plot comparisons, we applied a rigorous Bayesian density estimation method that produces continuous estimates of the backbone ϕ,ψ distributions. Based on this statistical modeling, a robust hierarchical clustering was performed using a divergence score to measure the similarity between plots. There were seven general groups based on the clusters from the complete Ramachandran data: nonpolar/β-branched (Ile and Val), AsX (Asn and Asp), long (Met, Gln, Arg, Glu, Lys, and Leu), aromatic (Phe, Tyr, His, and Cys), small (Ala and Ser), bulky (Thr and Trp), and, lastly, the singletons of Gly and Pro. At the level of secondary structure (helix, sheet, turn, and coil), these groups remain somewhat consistent, although there are a few significant variations. Besides the expected uniqueness of the Gly and Pro distributions, the nonpolar/β-branched and AsX clusters were very consistent across all types of secondary structure. Effectively, this consistency across the secondary structure classes implies that side-chain steric effects strongly influence a residueʹs backbone torsion angle conformation. These results help to explain the plasticity of amino acid substitutions on protein structure and should help in protein design and structure evaluation.
Clustering , Residue backbone similarity , Bayesian density estimation , Ramachandran plot , Torsion angles