The BackMAP Python module: how a simpler Ramachandran number can simplify the life of a protein simulator
- Published
- Accepted
- Received
- Academic Editor
- Vladimir Uversky
- Subject Areas
- Bioinformatics, Computational Biology, Mathematical Biology
- Keywords
- Bioinformatics, Ramachandran number, Graphing tool, BackMAP, Simulation, Protein, Peptide, Backbone
- Copyright
- © 2018 Mannige
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
- Cite this article
- 2018. The BackMAP Python module: how a simpler Ramachandran number can simplify the life of a protein simulator. PeerJ 6:e5745 https://doi.org/10.7717/peerj.5745
Abstract
Protein backbones occupy diverse conformations, but compact metrics to describe such conformations and transitions between them have been missing. This report re-introduces the Ramachandran number (ℛ) as a residue-level structural metric that could simply the life of anyone contending with large numbers of protein backbone conformations (e.g., ensembles from NMR and trajectories from simulations). Previously, the Ramachandran number (ℛ) was introduced using a complicated closed form, which made the Ramachandran number difficult to implement. This report discusses a much simpler closed form of ℛ that makes it much easier to calculate, thereby making it easy to implement. Additionally, this report discusses how ℛ dramatically reduces the dimensionality of the protein backbone, thereby making it ideal for simultaneously interrogating large numbers of protein structures. For example, 200 distinct conformations can easily be described in one graphic using ℛ (rather than 200 distinct Ramachandran plots). Finally, a new Python-based backbone analysis tool—BackMAP—is introduced, which reiterates how ℛ can be used as a simple and succinct descriptor of protein backbones and their dynamics.
Introduction
Proteins are a class of biomolecules unparalleled in their functionality (Berg, Tymoczko & Stryer, 2010). A natural protein may be thought of as a linear chain of amino acids, each normally sourced from a repertoire of 20 naturally occurring amino acids. Proteins are important partially because of the structures that they access: the conformations (conformational ensemble) that a protein assumes determines the functions available to that protein. However, all proteins are dynamic: even stable proteins undergo long-range motions in their equilibrium states; that is, they have substantial diversity in their conformational ensemble (James & Tawfik, 2003a, 2003b; Oldfield et al., 2005; Tokuriki & Tawfik, 2009; Schad, Tompa & Hegyi, 2011; Vertessy & Orosz, 2011; Mannige, 2014). Additionally, a number of proteins undergo conformational transitions, without which they may not properly function. Finally, some proteins—intrinsically disordered proteins—display massive disorder with conformations that dramatically change over time (Uversky, 2003; Fink, 2005; Midic et al., 2009; Espinoza-Fonseca, 2009; Uversky & Dunker, 2010; Tompa, 2011; Sibille & Bernado, 2012; Kosol et al., 2013; Dunker et al., 2013; Geist et al., 2013; Baruah, Rani & Biswas, 2015), and whose characteristic structures are still not well-understood (Beck et al., 2008).
Large-scale changes in a protein occur due to changes in protein backbone conformations. Figure 1 is a cartoon representation of a peptide/protein backbone, with the backbone bonds themselves represented by darkly shaded bonds. Ramachandran, Ramakrishnan & Sasisekharan (1963) had recognized that the backbone conformational degrees of freedom available to an amino acid (residue) i can be almost completely described by only two dihedral angles: ϕi and ψi (Fig. 1, green arrows). Today, Ramachandran plots are used to qualitatively describe protein backbone conformations.
The Ramachandran plot is recognized as a powerful tool for two reasons: (1) it serves as a map for structural “correctness” (Laskowski et al., 1993; Hooft, Sander & Vriend, 1997; Laskowski, 2003), since many regions within the Ramachandran plot space are energetically not permitted (Momen et al., 2017); and (2) it provides a qualitative snapshot of the structure of a protein (Berg, Tymoczko & Stryer, 2010; Alberts et al., 2002; Subramanian, 2001; Lovell et al., 2003). For example, particular regions within the Ramachandran plot indicate the presence of particular locally-ordered secondary structures such as the α-helix and β-sheet (see Fig. 2A).
While the Ramachandran plot has been useful as a measure of protein backbone conformation, it is not popularly used to assess structural dynamism and transitions (unless specific knowledge exists about whether a particular residue is believed to undergo a particular structural transition). This is because of the two-dimensionality of the plot: describing the behavior of every residue involves tracking its position in two-dimensional (ϕ, ψ) space. For example, a naive description of positions of a peptide in a Ramachandran plot (Fig. 2B) needs more annotations for a per-residue analysis of the peptide backbone’s structure. Given enough residues, it would be impractical to track the position of each residue within a plot. This is compounded by time, as each point in (Fig. 2B) becomes a curve in (Fig. 2C), further confounding the situation. The possibility of picking out previously unseen conformational transitions and dynamism becomes a logistical impracticality. As indicated above, this impracticality arises primarily from the fact that the Ramachandran plot is a two-dimensional map.
Beyond Ramachandran plots, tracking changes in a protein trajectory is either overly detailed or overly holistic: an example of an overly detailed study is the tracking of exactly one or a few atoms over time (this already poses a problem, since we would need to know exactly which atoms are expected to partake in a transition); an example of a holistic metric is the radius of gyration (this also poses a problem, since we will never know which residues contribute to a change in radius of gyration without additional interrogation). With our understanding of protein dynamics undergoing a new renaissance—especially due to intrinsically disordered proteins and allostery—having hypothesis-agnostic yet detailed (residue-level) metrics of protein structure has become even more relevant. But unfortunately, there has been no single compact descriptor of protein structure. This impedes the naïve or hypothesis-free exploration of new trajectories/ensembles.
It has recently been shown that the two Ramachandran backbone parameters (ϕ, ψ) may be conveniently combined into a single number—the Ramachandran number [ℛ(ϕ, ψ) or simply ℛ]—with little loss of information (Fig. 3; Mannige, Kundu & Whitelam, 2016). In a previous report, detailed discussions were provided regarding the reasons behind and derivation of ℛ (Mannige, Kundu & Whitelam, 2016). This report provides a simpler version of the equation previously published (Mannige, Kundu & Whitelam, 2016), and further discusses how ℛ may be used to provide information about protein ensembles and trajectories. Finally, this report introduces a software package—BackMAP—that can be used to produce pictograms that describe the behavior of a protein backbone within user-inputted conformations, structural ensembles, and trajectories. Given that each pictogram provides a picture of the whole protein backbone (i.e., all ϕ and ψ angles), these pictograms are named multi-angle pictures (MAPs). BackMAP is presently available on GitHub (https://github.com/ranjanmannige/BackMAP).
Introducing the Simplified Ramachandran Number (ℛ)
The Ramachandran number is both an idea and an equation. Conceptually, the Ramachandran number (ℛ) is any closed form that collapses the dihedral angles ϕ and ψ into one structurally meaningful number (Mannige, Kundu & Whitelam, 2016). Mannige, Kundu & Whitelam (2016) presented a version of the Ramachandran number (shown in the appendix as Eq. (7)) that was complicated in closed form, thereby reducing its utility. Here, a simpler and more accurate version of the Ramachandran number is introduced. The appendix shows how this simplified form was derived from the original closed form (Eq. (7)).
Given arbitrary limits of ϕ ∈ [ϕmin, ϕmax) and ψ ∈ [ψmin, ψmax), where the minimum and maximum values differ by 360°, the most general and accurate equation for the Ramachandran number is (1)
For consistency, we maintain throughout this paper that ϕmin = ψmin = −180° or −π radians, which makes (2)
As evident in Fig. 3, the secondary structure distributions within the Ramachandran plot are faithfully reflected in corresponding distributions within Ramachandran number space. This paper shows how the Ramachandran number is both compact enough and informative enough to generate immediately useful graphs (MAPs) of a dynamic protein backbone.
Reasons to Use the Ramachandran Number
Ramachandran numbers are structurally meaningful
In addition to resolving positions of secondary structures (Fig. 3), ℛ corresponds well to structural measures such as radius of gyration (Rg), end-to-end distance (Re), and chirality (χ). These relationships are shown in Fig. 4. Note that chirality comes in many forms, for example, one could be talking about different stereo-isomers, such as L vs D amino acids, or one could be concerned with left-twisting versus right-twisting backbones, that is, handedness (Mannige, 2017). This report will primarily be focused on chirality in context of backbone twist/handedness.
The trends in Fig. 4 show that as one progresses from low to high ℛ, various structural properties also progress smoothly. Additionally, backbones that display similar ℛ also show little variation in structural properties, as evidenced by the small standard deviation bars. It is also important to note that the standard deviations shown in Fig. 4 were calculated by first populating every possible region of (ϕ, ψ)-space. However, in reality, most regions of (ϕ, ψ)-space are unoccupied due to steric/electrostatic constraints, which means that these error bars are likely to be even smaller for natural protein backbones than those depicted here. Finally, the ℛ number is calculated by taking “sweeps” of the (ϕ, ψ)-space in lines that are parallel to the negatively-sloping diagonal. Interestingly, such “sweeps” encounter only one major (dense) region within (ϕ, ψ)-space (e.g., ℛ’s in the general vicinity of 0.34 represent structures that resemble α-helices). This means that ℛ can also be used to assess the types of secondary structure present in a protein conformation.
Ramachandran codes are stackable
An important aspect of the Ramachandran number (ℛ) lies in its compactness compared to the traditional Ramachandran pair (ϕ, ψ). The value of the conversion from (ϕ, ψ)-space to ℛ-space is that the structure of a protein can be described in various one-dimensional arrays (per-structure “Ramachandran codes” or “ℛ-codes” or multi-angle maps); see, e.g., Fig. 5.
In addition to assuming a small form factor, ℛ-codes may then be stacked side-by-side for visual and computational analysis. There lies its true power.
For example, the one-ℛ-to-one-residue mapping means that the entire residue-by-residue structure of a protein can be shown using a string of ℛis (which would show regions of secondary structure and disorder, for starters). Additionally, an entire protein’s backbone makeup can be shown as a histogram in ℛ-space (which may reveal a protein’s topology). The power of this format lies not only in the capacity to distill complex structure into compact spaces, but in its capacity to display many complex structures in this format, side-by-side (stacking).
Peptoid nanosheets (Mannige et al., 2015) will be used here as an example of how multiple structures, in the form of ℛ-codes, may be stacked to provide immediately useful pictograms. Peptoids are stereo-isomers of peptides, where the sidechain is attached to the backbone nitrogen rather than the carbon atom. Since both peptoids and peptides share identical backbone connectivity, the analysis described below could be applied to both peptides and peptoids.
Peptoid nanosheets are a recently discovered peptide-mimic that, in one molecular dynamics simulation (Mannige et al., 2015), were shown to display a novel secondary structure. In the reported model (Mannige et al., 2015), each peptoid within the nanosheet displays backbone conformations that alternate in chirality, causing the backbone to look like a meandering snake that nonetheless maintains an overall linear direction. This secondary structure was discovered by first setting up a nanosheet where all peptoid backbones were restrained to be fully extended, after which the restraints were energetically softened and completely released. Figure 6A shows snapshots of each of these states. As evident in Figs. 6B and 6C, the two types of ℛ-code stacks display salient information at first glance: (1) Fig. 6B shows that the extended backbone first undergoes some rearrangement with softer restraints, and then becomes more binary in arrangement as we look down the backbone; and (2) Fig. 6C shows that lifting restraints on the backbone causes a dramatic change in backbone topology, namely a birth of a bimodal distribution evident in the two parallel horizontal bands.
By utilizing ℛ, maps such as those in Fig. 6 provide information about every ϕ and ψ within the backbone. As such, these maps are dubbed MAPs. A Python package called BackMAP created Figs. 6A and 6B, and is provided as a GitHub repository (https://github.com/ranjanmannige/BackMAP). BackMAP takes in a PDB structure file containing a single structure, or multiple structures separated by the code “MODEL.”
Case study: picking out subtle differences from high volume of data
This section expands on the notion that ℛ-numbers—due to their compactness/stackability—can be used to pick out backbone structural trends that would be hard to decipher using any other metric. For example, it is well known that prolines (pro) display unusual backbone behavior: in particular, proline backbones occupy structures that are close to but distinct from α-helical regions. Due to the two-dimensionality of Ramachandran plots (Fig. 7A), such distinctions are hard to visually pick out from Ramachandran plots. However, stacking per-amino-acid ℛ-codes side-by-side make such differences patent (Fig. 7B; see arrow).
It is also known that amino acids preceeding prolines display unusual shifts in backbone twist/chirality. For example, Fig. 8C shows that amino acids appearing before prolines behave differently than they would otherwise (see the upward-facing arrow). Additionally, amino acids following glycines also appear to have their structures modified (Fig. 8D; upward arrow). Note that these results are not new, and it has already been confirmed that, for example, nearest neighbors affect the conformational behavior of an amino acid as witnessed within Ramachandran plots (Ting et al., 2010), and proline changes the backbone conformation of the preceeding residue (Gunasekaran et al., 1998; Ho & Brasseur, 2005). However, Figs. 7 and 8 indicate that such information can be more concisely shown/identified when structures are stacked side-by-side in the form of ℛ-codes. Such subtle changes are often witnessed when protein backbones transition from one state to another.
Using the BackMAP Python Module
Installation
BackMAP may either be installed locally by downloading the GitHub repository, or installed directly by running the following line in the command prompt (assuming that pip exists): > pip install backmap.
Usage
The module can either be imported and used within existing scripts, or used as a standalone package using the command “python -m backmap.” First the in-script usage will be discussed.
In-script usage I: first simple test
The simplest test would be to generate Ramachandran numbers from (ϕ, ψ) pairs:
# Import module 1
import backmap 2
# Convert (phi, psi) to R 3
print backmap.R(phi =0, phi =0) # Expected output : 0.5 4
print backmap.R( −180, −180) # Expected output : 0.0 5
print backmap.R( 180, 180) # Expected output : 1.0 (equivalent in meaning to 0) 6
In-script usage II: basic usage for creating multi-angle pictures
The following code shows how MAPs of protein backbones can be generated:
1. Select and read a protein PDB file
Each trajectory frame must be a set of legitimate protein databank (PDB) “ATOM” records separated by “MODEL” keywords (distinct models show up as distinct frames on the x-axis or abscissa).
import backmap 1 import matplotlib.pyplot as plt # Pyplot is needed for drawing graphs 2 import numpy as np # Numpy is needed for data transformations 3 pdbfn = './tests/pdbs/1xqq.pdb' # Set pdb name 4 data = backmap.read_pdb(pdbfn) # READ PDB in the form of a matrix with columns 5
Here, “data” is a 2d array with four columns [“model,” “chain,” “resid,” “R”]. The first row of “data” is the header, with values that follow.
2. Select color scheme (color map)
In addition to custom colormaps listed in the next section, one can also use traditional colormaps available at matplotlib.org (e.g., “Reds” or “Reds_r”).
# setting the name of the colormap 6 cmap = "SecondaryStructure" 7
3. Draw per-chain MAPs
# Grouping by chain 8 grouped_data = backmap.group_data_by (data, group_by='chain', 9 columns_to_return=['model','resid','R']) 10 for chain in grouped_data.keys(): # Going through each chain 11 # Getting the X,Y,Z values for each entry 12 models, residues, Rs = grouped_data [chain] 13 # Finally, creating (but not showing) the graph 14 backmap.draw_xyz(X = models , Y = residues , Z = Rs 15 , xlabel = 'Frame #', ylabel = "Residue #",zlabel = 'ℛ' 16 ,cmap = cmap , title = "Chain: ' "+chain+" ' " 17 ,vmin=0,vmax=1) 18 # Now, we display the graph: 19 plt.show() # ... one can also use plt.savefig() to save to file 20
The code above results in Fig. 9A.
Additionally, by changing how one assigns values to “X” and “Y,” one can easily construct and draw other types of graphs such as time-resolved histograms, per-residue fluctuations when compared to the first (D1) and previous structure (D−1) within the trajectory, etc.
In-script usage III: creating custom graphs
Other types of graphs can be easily created by modifying part three of the code above. For example, the following code creates histograms of R, one for each model (starting from line 11 above).
for chain in grouped_data.keys(): 11
models, residues, Rs = grouped_data[chain] 12
13
'Begin custom code' 14
X = []; Y=[]; Z=[]; # Will set X=model, Y=R, Z=P(R) 15
# Bundling the three lists into one 2d array 16
new_data = np.array(list(zip(models , residues , Rs))) 17
# Getting all R values, model by model 18
for m in sorted(set(new_data[:,0])): # column 0 is the model column 19
# Getting all Rs for that model # 20
current_rs = new_data[np.where(new_data[:,0]==m)][:,2] # column 2 contains R 21
# Getting the histogram 22
a,b = np.histogram(current_rs,bins=np.arange(0,1.01,0.01)) 23
max_count = float(np.max(a)) 24
for i in range(len(a)): 25
X.append(m); Y.append((b[i]+b[i+1])/2.0); Z.append(a[i]/float(np.sum(a))); 26
'End custom code' 27
28
# Finally, creating (but not showing) the graph 29
backmap.draw_xyz(X = X , Y = Y , Z = Z 30
,xlabel ='Frame #', ylabel ="ℛ",zlabel ="P'(ℛ)" 31
,cmap = 'Greys', ylim=[0,1]) 32
plt.yticks(np.arange(0,1.00001,0.2)) 33
# Now, we display the graph: 34
plt.show() # ... one can also use plt.savefig() to save to file 35
The code above results in Fig. 9B.
In-script usage IV: available color schemes (CMAPs)
Aside from the general color maps (cmaps) that exist in matplotlib (e.g., “Greys,” “Reds,” or, god forbid, “jet”), BackMAP provides two new colormaps: “Chirality” (key: +ve twists: red; −ve twists: blue), and “SecondaryStructure” (key: potential α-helices: red; β-sheets: blue; ppII-helices: cyan). Right twisting backbones are shown in red; left twisting backbones are shown in blue. Figure 10 shows how a single protein ensemble may be described using these schematics. As illustrated in Fig. 10B, cmaps available within the standard matplotlib package do not distinguish between major secondary structures well, while those provided by BackMAP do. In case it is known that the protein backbone accesses non-traditional regions of the Ramachandran plot, a four-color schematic will be needed (see below for more discussions).
Stand alone usage
BackMAP can be used as a stand alone package by running “> python -m backmap -pdb <pdb_dir_or_file>.” The sectons below describe the expected outputs and how they may be interpreted.
Stand alone Example I: a stable protein
Figures 11B–11F below were created by running “> python -m backmap ./tests/pdbs/1xqq.pdb” (Fig. 11A was created using VMD). These graphs indicate that conformational ensemble 1xqq describes a conformationally stable protein, since 1) each conformation shows little change in the ℛ histogram over time (Fig. 11B), and 2) each residue fluctuates little in color (structure) over “time” (Figs. 11C–11F; see Methods). Here and below, it is assumed that discrete models represent distinct states of the protein over “time”.
In particular, each column in Fig. 11B describes the histogram in Ramachandran number (ℛ) space for a single model/timeframe. These histograms show the presence of both α-helices (at ℛ ≈ 0.34) and β-sheets (at ℛ ≈ 0.52). Additionally, Figs. 11C and 11D describe per-residue conformational plots (colored by two different metrics or CMAPs), which show that most of the protein backbone remains relatively stable over time (e.g., few fluctuations in state or “color” are evident over frame #). Finally, Fig. 11E describes the extent to which a single residue’s state has deviated from the first frame, and Fig. 11F describes the extent to which a single residue’s state has deviated from its state in the previous frame. All these graphs show that this protein is relatively conformationally stable.
Stand alone Example II: an intrinsically disrodered protein
Figure 12 is identical to Fig. 11, except that the panels pertain to an intrinsically disordered protein 2fft whose structural ensemble describes dramatically distinct conformations.
As compared to the conformationally stable protein above, protein 2fft is much more flexible. Figure 11B shows that the states accessed per model are diverse and dramatically fluctuate over the entire range of ℛ (this is especially true when compared to a stable protein, see Fig. 11B).
The diverse states occupied by each residue (Figs. 11C and 11D) confirm the conformational variation displayed by most of the backbone (Figs. 11E and 11F similarly show that most of the residues fluctuate dramatically).
Yet, interestingly, Figs. 11C–11F also show an unusually stable region—residues 15–25—which consistently display the same conformational (α-helical) state at ℛ ≈ 0.34 (interpreted as the color red in Fig. 11C). This trend would be hard to recognize by simply looking at the structural ensemble (Fig. 11A).
A signed Ramachandran number for “misbehaving” backbones
The Ramachandran number increases in value from the bottom left of the Ramachandran plot to the top right in sweeps that are parallel to the negative sloping diagonal. As discussed in Mannige, Kundu & Whitelam (2016), this method of mapping a two-dimensional space into one number is still structurally meaningful and descriptive because (1) most structural features of the protein backbone—for example, radius of gyration (Mannige, Kundu & Whitelam, 2016), end-to-end distance (Mannige, Kundu & Whitelam, 2016), and chirality (Mannige, 2017)—vary little along lines parallel to the negatively-sloping diagonal (this is indicated by relatively small standard deviations in structural metrics for similar ℛs; Fig. 4), and (2) most protein backbones display chiral centers and therefore predominantly appear on the top left region of the Ramachandran plot (above the dashed diagonal in Fig. 13A).
However, not all backbones localize in only one half of the Ramachandran plot. Particularly, among biologically relevant amino acids, glycine occupies both regions of the Ramachandran plot (Fig. 13B; of note, the αL helix region becomes relatively prominent). On the other hand, prolines are known to form polyproline-II helices (ppII in Fig. 13C), which falls on almost the same “sweep” as glycine rich peptides (red dot-dashed line). In situations where both prolines and glycines are abundant, the Ramachandran number (ℛ) would fail to distinguish αL from ppII (Fig. 13D; regions outlined by rectangles).
To accomodate the situation where achiral backbones are expected (e.g., if peptoids or polygycines are being studied), an additional Ramachandran number—the signed Ramachandran number ℛs—is introduced here. ℛs is identical to the original number in magnitude, but which changes sign from + to − as you approach ℛ numbers that are to the right (or below) the positively sloped diagonal i.e., (3) Signed Ramachandran numbers are calculated by adding “-signed” within the command line implementation or by adding “signed=True” when making backmap.R() calls.
As an example of the utility of ℛs, Fig. 13F shows that ℛs easily distinguishes αD from ppII.
Note that the signed ℛs, while useful, would be important in very limited scenarios, as more than 96% of the amino acids in the PDB occupy the upper-left region of the Ramachandran plot (with 3% of “rule breakers” contributed mostly by glycines).
Conclusion
A simpler Ramachandran number is reported——which, while being a single number, provides much information. For example, for proteins within the protein databank, ℛ values above 0.5 are left-handed in twist, while those below 0.5 are right handed; ℛ values close to 0, 0.5, and 1 are extended; β-sheets occupy ℛ values at around 0.52; and right-handed α-helices hover around 0.34. Given the Ramachandran number’s “stackability,” single graphs can hold detailed information of the progression/evolution of molecular trajectories. Indeed, Fig. 8 shows how 400 distinct Ramachandran plots can easily be fit into one graph when using ℛ. Finally, a python script/module (BackMAP) has been provided in an online GitHub repository to promote the utility of ℛ as a universal metric.
Materials
Statistics about single amino acid conformations and secondary structures (excepting ppII) were derived from the Structural Classification of Proteins or SCOPe website (Release 2.06; Fox, Brenner & Chandonia, 2014). This database, currently available at http://scop.berkeley.edu/downloads/pdbstyle/pdbstyle-sel-gs-bib-40-2.06.tgz, contains 13,760 three-dimensional protein conformations (one domain per conformation) with lower than 40% sequence identity. Secondary structure annotations were assigned using the DSSP algorithm (Kabsch & Sander, 1983), although the STRIDE algorithm (Frishman & Argos, 1995) provides qualitatively identical distributions. These statistics were used to produce distributions within Figs. 2A, 3A and 3C.
Given the absence of ppII annotation in the present version of DSSP, statistics for ppII (used to generate the ppII distributions in Figs. 2A, 3A and 3C) were obtained from segments within 16,535 proteins annotated by PolyprOnline (Chebrek et al., 2014) to contain three or more residues of the secondary structure.
Figure 6 represents a trajectory of a portion of a single peptoid backbone within a “relaxing” peptoid nanosheet bilayer. The conformation of this backbone—derived from work by Mannige et al. (2015) and Mannige, Kundu & Whitelam (2016)—is also available as “/tests/pdbs/nanosheet_birth_U7.pdb” within the companion GitHub repository.
The following protein structures were obtained from the PDB: 1mba, 2acy, 1xqq, and 2fft. The first two in the list (1mba, 2acy) describe single conformations and the last two (1xqq, 2fft) describe ensembles. ℛ-based MAPs were created for each structure X ∈ [nanosheet_birth_U7.pdb,2fft,2acy,1xqq,1mba] using the following command line code:
> python -m backmap -pdb tests/pdbs/X.pdb
The output of this command line implementation were used in Figs. 5B, 6B and 10B, 11B and 12B.
In order to describe changes in structure, this report uses two metrics for structural deviation: deviation in present structure when compared to the first conformation in the trajctory (D1), and deviation in structure compared to the previous conformation in the trajctory (D−1). For any residue r at time t, these equations can be described as follows: (4)
All three-dimensional representations of proteins (Figs. 5A, 6A and 10A, 11A and 12A) were created using VMD (Humphrey, Dalke & Schulten, 1996). Finally, all other figures—excepting Fig. 1 that is derived from Mannige, Kundu & Whitelam (2016)—were created using helper Python scripts available in manuscript/python_generators/ within the companion GitHub repository.
Appendix
Simplifying the Ramachandran number (ℛ)
This section will derive the simplified Ramachandran number presented in this paper from the more complicated looking Ramachandran number introduced previously (Mannige, Kundu & Whitelam, 2016).
Assuming the bounds ϕ ∈ [ϕmin, ϕmax) and ϕ ∈ [ψmin, ψmax), the previously described Ramachandran number takes the form (5) where, is the Ramachanran number with range (0, 1), and is the unnormalized integer-spaced Ramachandran number whose closed form is (6)
Here, [x] rounds x to the closest integer value, σ is a scaling factor, discussed below, and λ is the range of an angle in degrees (i.e., λ = ϕmax − ϕmin). Effectively, this equation does the following: (1) It divides up the Ramachandran plot into (360° σ1/°)2 squares, where σ is a user-selected scaling factor that is measured in reciprocal degrees (see Fig. 8B in Mannige, Kundu & Whitelam (2016)); (2) It then assigns integer values to each square by setting the lowest integer value to the bottom left of the Ramachandran plot (ϕ = −180°, ψ = −180°), and proceeding from the bottom left to the top right of the Ramachandran plot by iteratively slicing down −1/2 sloped lines and assigning increasing integer values to each square that one encounters; (3) Finally, the equation assigns any (ϕ, ψ) pair within ϕ, ψ ∈ [−ϕmin, ϕmax) to the integer value (Rℤ) assigned to the divvied-up square that they it exists in. Further discussions on this process are available in the previous publication (Mannige, Kundu & Whitelam, 2016).
Combining the two equations (Eqs. (5) and (6)) results in the following, rather imposing, equation for the Ramachandran number: (7)
However, useful Eq. (7) is, the complexity of the equation may be a deterrent toward utilizing it. This paper reports a simpler equation that is derived by taking the limit of Eq. (7) as σ tends toward ∞. In particular, when σ→∞, Eq. (7) becomes (8)
Assuming that ϕ, ψ ∈ [−180°, 180°) or [−π, π), (9)
Conformation of this limit is shown numerically in Fig. A1. Since larger σs indicate higher accuracy, represents an exact representation of the Ramachandran number. Using this closed form, this report shows how both static structural features and complex structural transitions may be identified with the help of Ramachandran number-derived plots.
Assuming, a different range (say, ϕ, ψ ∈ [0, 2π)), the Ramachandran number in that frame of reference will be (10) However, in changing the ranges, the meaning of the Ramachandran number will change. This manuscript assumes that all angles (ϕ, ψ, ω) range between −π (−180°) and π (180°).