Force Fields and Interactions
Overview
Teaching: 30 min
Exercises: 0 minQuestions
What is Molecular Dynamics and how can I benefit from using it?
What is a force field?
What mathematical energy terms are used in biomolecular force fields?
Objectives
Understand strengths and weaknesses of MD simulations
Understand how the interactions between particles in a molecular dynamics simulation are modeled
Introduction
- Atoms and molecules interact with each other.
- We carry out molecular modeling by following and analyzing dynamic structural models in computers.
Figure from: AI-Driven Multiscale Simulations Illuminate Mechanisms of SARS-CoV-2 Spike Dynamics
- The size and the length of MD simulations has been recently vastly improved.
- Longer and larger simulations allow us to tackle wider range of problems under a wide variety of conditions.
Recent example - simulation of the whole SARS-CoV-2 virion
- System size: 304,780,149 atoms, 350 Å × 350 Å lipid bilayer, simulation time 84 ns
Figure from AI-Driven Multiscale Simulations Illuminate Mechanisms of SARS-CoV-2 Spike Dynamics
- Showed that spike glycans can modulate the infectivity of the virus.
- Characterized interactions between the spike and the human ACE2 receptor.
- Used ML to identify conformational transitions between states and accelerate conformational sampling.
Goals of the Workshop
- Introduce you to the method of molecular dynamics simulations.
- Guide you to using various molecular dynamics simulation packages and utilities.
- Teach how to use Digital Research Alliance of Canada (Alliance) clusters for system preparation, simulation and trajectory analysis.
The focus will be on reproducibility and automation by introducing scripting and batch processing.
The theory behind the method of MD.
Force Fields
- Understanding complex biological phenomena requires simulations of large systems for a long time windows.
- The forces acting between atoms and molecules are very complex.
- Very fast method of evaluations molecular interactions is needed to achieve these goals.
Interactions are approximated with a simple empirical potential energy function.
- The potential energy function allows calculating forces: $ \vec{F}=-\nabla{U}(\vec{r}) $
- With the knowledge of the forces acting on an object, we can calculate how the position of that object changes over time: $ \vec{F}=\frac{d\vec{p}}{dt} $.
- Advance system with very small time steps assuming the velocities don’t change.
Flow diagram of MD simulation
A force field is a set of empirical energy functions and parameters used to calculate the potential energy U as a function of the molecular coordinates.
- Potential energy function used in MD simulations is composed of non-bonded and bonded interactions:
$U(\vec{r})=\sum{U_{bonded}}(\vec{r})+\sum{U_{non-bonded}}(\vec{r})$
- Only pairwise interactions are considered.
Classification of force fields.
Class 1 force fields.
- Dynamics of bond stretching and angle bending is described by simple harmonic motion (quadratic approximation)
- Correlations between bond stretching and angle bending are omitted.
Examples: AMBER, CHARMM, GROMOS, OPLS
Class 2 force fields.
- Add anharmonic cubic and/or quartic terms to the potential energy for bonds and angles.
- Contain cross-terms describing the coupling between adjacent bonds, angles and dihedrals.
Class 3 force fields.
- Explicitly add special effects of organic chemistry such as polarization, stereoelectronic effects, electronegativity effect, Jahn–Teller effect, etc.
Examples of class 3 force fields are: AMOEBA, DRUDE
Energy Terms of Biomolecular Force Fields
Non-Bonded Terms
- Describe non-elecrostatic and electrostatic interactions between all pairs of atoms.
- Non-elecrostatic potential energy is most commonly described with the Lennard-Jones potential.
The Lennard-Jones potential
- Approximates the potential energy of non-elecrostatic interaction between a pair of non-bonded atoms or molecules:
$V_{LJ}(r)=\frac{C12}{r^{12}}-\frac{C6}{r^{6}}$
- The \(r^{-12}\) term approximates the strong Pauli repulsion originating from overlap of electron orbitals.
- The \(r^{-6}\) term describes weaker attractive forces acting between local dynamically induced dipoles in the valence orbitals.
- The too steep repulsive part often leads to an overestimation of the pressure in the system.
- The LJ potential is commonly expressed in terms of the well depth \(\epsilon\) and the van der Waals radius \(\sigma\):
$V_{LJ}(r)=4\epsilon\left[\left(\frac{\sigma}{r}\right)^{12}-\left(\frac{\sigma}{r}\right)^{6}\right]$
- Relation between C12, C6, \(\epsilon\) and \(\sigma\):
$C12=4\epsilon\sigma^{12},C6=4\epsilon\sigma^{6}$
The Lennard-Jones Combining Rules
- The LJ interactions between different types of atoms are computed by combining the LJ parameters.
- Avoid huge number of parameters for each combination of different atom types.
- Different force fields use different combining rules.
- The arithmetic mean (Lorentz) is motivated by collision of hard spheres
- The geometric mean (Berthelot) has little physical argument.
Geometric mean:
\(C12_{ij}=\sqrt{C12_{ii}\times{C12_{jj}}}\qquad C6_{ij}=\sqrt{C6_{ii}\times{C6_{jj}}}\qquad\) (GROMOS)
\(\sigma_{ij}=\sqrt{\sigma_{ii}\times\sigma_{jj}}\qquad\qquad\qquad \epsilon_{ij}=\sqrt{\epsilon_{ii}\times\epsilon_{jj}}\qquad\qquad\) (OPLS)
Lorentz–Berthelot:
\(\sigma_{ij}=\frac{\sigma_{ii}+\sigma_{jj}}{2},\qquad \epsilon_{ij}=\sqrt{\epsilon_{ii}\times\epsilon_{jj}}\qquad\) (CHARM, AMBER).
- Known issues: overestimates the well depth
Less common combining rules.
Waldman–Hagler:
\(\sigma_{ij}=\left(\frac{\sigma_{ii}^{6}+\sigma_{jj}^{6}}{2}\right)^{\frac{1}{6}}\) , \(\epsilon_{ij}=\sqrt{\epsilon_{ij}\epsilon_{jj}}\times\frac{2\sigma_{ii}^3\sigma_{jj}^3}{\sigma_{ii}^6+\sigma_{jj}^6}\)
This combining rule was developed specifically for simulation of noble gases.
Hybrid (the Lorentz–Berthelot for H and the Waldman–Hagler for other elements). Implemented in the AMBER-ii force field for perfluoroalkanes, noble gases, and their mixtures with alkanes.
The Buckingham potential
- Replaces the repulsive \(r^{-12}\) term in Lennard-Jones potential with exponential function of distance:
$V_{B}(r)=Aexp(-Br) -\frac{C}{r^{6}}$
- Exponential function describes electron density more realistically
- Computationally more expensive to calculate.
- Risk of “buckingham catastrophe” at short distances.
There is only one combining rule for Buckingham potential in GROMACS:
$A_{ij}=\sqrt{(A_{ii}A_{jj})}$
$B_{ij}=2/(\frac{1}{B_{ii}}+\frac{1}{B_{jj}})$
$C_{ij}=\sqrt{(C_{ii}C_{jj})}$
Combining rule (GROMACS):
\(A_{ij}=\sqrt{(A_{ii}A_{jj})} \qquad B_{ij}=2/(\frac{1}{B_{ii}}+\frac{1}{B_{jj}}) \qquad C_{ij}=\sqrt{(C_{ii}C_{jj})}\)
Specifying Combining Rules
GROMACS
Combining rule is specified in the [defaults] section of the forcefield.itp file (in the column ‘comb-rule’).
[ defaults ] ; nbfunc comb-rule gen-pairs fudgeLJ fudgeQQ 1 2 yes 0.5 0.8333
Geometric mean is selected by using rules 1 and 3; Lorentz–Berthelot rule is selected using rule 2.
GROMOS force field requires rule 1; OPLS requires rule 3; CHARM and AMBER require rule 2
The type of potential function is specified in the ‘nbfunc’ column: 1 selects Lennard-Jones potential, 2 selects Buckingham potential.
NAMD
By default, Lorentz–Berthelot rules are used. Geometric mean can be turned on in the run parameter file:
vdwGeometricSigma yes
The electrostatic potential
- Point charges are assigned to the positions of atomic nuclei to approximate the electrostatic potential around a molecule.
- The Coulomb’s law: $V_{Elec}=\frac{q_{i}q_{j}}{4\pi\epsilon_{0}\epsilon_{r}r_{ij}}$
Short-range and Long-range Interactions
- Interaction is short-range if the potential decreases faster than r-3
- The Lennard-Jones interactions are short-ranged, r-6.
- The Coulomb interactions are long-ranged, r-1.
Bonded Terms
The bond potential
- Oscillation about an equilibrium bond length r0 with bond constant kb: $V_{Bond}=k_b(r_{ij}-r_0)^2$
- Poor approximation at extreme stretching, but it works well at moderate temperatures.
The angle potential
- Oscillation about an equilibrium angle \(\theta_{0}\) with force constant \(k_\theta\): $V_{Angle}=k_\theta(\theta_{ijk}-\theta_0)^2$
- The force constants for angle potential are about 5 times smaller that for bond stretching.
The torsion (dihedral) angle potential
- Defined for every 4 sequentially bonded atoms.
- Sum of any number of periodic functions, n - periodicity, \(\delta\) - phase shift angle.
$V_{Dihed}=k_\phi(1+cos(n\phi-\delta)) + …$
- n represents the number of potential maxima or minima generated in a 360° rotation.
- Combination of n=2 and n=3 dihedrals used to reproduce cis/trans and trans/gauche energy differences in ethylene glycol
The improper torsion potential
- Also known as ‘out-of-plane bending’
- Defined for a group of 4 bonded atoms where the central atom i is connected to the 3 peripheral atoms j,k, and l.
- Used to enforce planarity.
- Given by a harmonic function: $V_{Improper}=k_\phi(\phi-\phi_0)^2$
Where the dihedral angle \(\phi\) is the angle between planes ijk and ijl.
- The dihedral angle \(\phi\) is the angle between planes ijk and ijl.
Coupling Terms
The Urey-Bradley potential
- Coupling between bond length and bond angle is described by the Urey-Bradley potential.
- The Urey-Bradley term is defined as a spring between the outer atoms of a bonded triplet.
- Approximated by a harmonic function: $V_{UB}=k_{ub}(r_{jk}-r_{ub})^2$
- Improve agreement with vibrational spectra.
- Do not affect overall conformational sampling.
- Implemented in CHARMM and AMOEBA force fields.
CHARMM CMAP correction potential
- Peptide torsion angles: phi, psi, omega.
- A protein can be seen as a series of linked sequences of peptide units which can rotate around phi/psi angles.
- phi/psi angles define the conformation of the backbone.
- phi/psi dihedral angle potentials correct for force field deficiencies such as errors in non-bonded interactions, electrostatics, lack of coupling terms, inaccurate combination, etc.
- CMAP potential was developed to improve the sampling of backbone conformations.
- CMAP parameter does not define a continuous function.
- it is a grid of energy correction factors defined for each pair of phi/psi angles typically tabulated with 15 degree increments.
Energy scale of potential terms
\(k_BT\) at 298 K ~ 0.593 \(\frac{kcal}{mol}\) Bond vibrations ~ 100 ‐ 500 \(\frac{kcal}{mol \cdot \unicode{x212B}^2}\) Bond angle bending ~ 10 - 50 \(\frac{kcal}{mol \cdot deg^2}\) Dihedral rotations ~ 0 - 2.5 \(\frac{kcal}{mol \cdot deg^2}\) van der Waals ~ 0.5 \(\frac{kcal}{mol}\) Hydrogen bonds ~ 0.5 - 1.0 \(\frac{kcal}{mol}\) Salt bridges ~ 1.2 - 2.5 \(\frac{kcal}{mol}\)
Exclusions from Non-Bonded Interactions
- In pairs of atoms connected by chemical bonds bonded energy terms replace non-bonded interactions.
- All pairs of connected atoms separated by up to 2 bonds (1-2 and 1-3 pairs) are excluded from non-bonded interactions. It is assumed that they are properly described with bond and angle potentials.
- 1-4 interaction represents a special case where both bonded and non-bonded interactions are required for a reasonable description. However, due to the short distance between the 1–4 atoms full strength non-bonded interactions are too strong and must be scaled down.
- Non-bonded interaction between 1-4 pairs depends on the specific force field.
- Some force fields exclude VDW interactions and scale down electrostatic (AMBER) while others may modify both or use electrostatic as is.
What Information Can MD Simulations Provide?
With the help of MD it is possible to model phenomena that cannot be studied experimentally. For example
- Understand atomistic details of conformational changes, protein unfolding, interactions between proteins and drugs
- Study thermodynamics properties (free energies, binding energies)
- Study biological processes such as (enzyme catalysis, protein complex assembly, protein or RNA folding, etc).
For more examples of the types of information MD simulations can provide read the review article: Molecular Dynamics Simulation for All.
Challenge: Counting Non-Bonded Interactions
How many non-bonded interactions are in the system with ten Argon atoms? 10, 45, 90, or 200?
Solution
Argon atoms are neutral, so there is no Coulomb interaction. Atoms don’t interact with themselves and the interaction ij is the same as the interaction ji. Thus the total number of pairwise non-bonded interactions is (10x10 - 10)/2 = 45.
Challenge: Exclusions
How many 1-4 interactions between carbon atoms are in the system composed of 100 pentane (H3C-CH2-CH2-CH2-CH3) molecules?
Solution
200
Challenge: Interaction potentials
Which potential term describes the interaction between two molecules? CMAP, Urey-Bradley, Lennard-Jones or angle?
Solution
The Lennard-Jones potential
Specifying Exclusions
GROMACS
The exclusions are generated by grompp as specified in the [moleculetype] section of the molecular topology .top file:
[ moleculetype ] ; name nrexcl Urea 3 ... [ exclusions ] ; ai aj 1 2 1 3 1 4
In the example above non-bonded interactions between atoms that are no farther than 3 bonds are excluded (nrexcl=3). Extra exclusions may be added explicitly in the [exclusions] section.
The scaling factors for 1-4 pairs, fudgeLJ and fudgeQQ, are specified in the [defaults] section of the forcefield.itp file. While fudgeLJ is used only when gen-pairs is set to ‘yes’, fudgeQQ is always used.
[ defaults ] ; nbfunc comb-rule gen-pairs fudgeLJ fudgeQQ 1 2 yes 0.5 0.8333
NAMD
Which pairs of bonded atoms should be excluded is specified by the exclude parameter.
Acceptable values: none, 1-2, 1-3, 1-4, or scaled1-4exclude scaled1-4 1-4scaling 0.83 scnb 2.0
If scaled1-4 is set, the electrostatic interactions for 1-4 pairs are multiplied by a constant factor specified by the 1-4scaling parameter. The LJ interactions for 1-4 pairs are divided by scnb.
Key Points
Molecular dynamics simulates atomic positions in time using forces acting between atoms
The forces in molecular dynamics are approximated by simple empirical functions
Fast Methods to Evaluate Forces
Overview
Teaching: 30 min
Exercises: 0 minQuestions
Why the computation of non-bonded interactions is the speed limiting factor?
How are non-bonded interactions computed efficiently?
What is a cutoff distance?
How to choose the appropriate cutoff distance?
Objectives
Understand how non-bonded interactions are truncated.
Understand how a subset of particles for calculation of short-range non-bonded is selected.
Learn how to choose the appropriate cutoff distance and truncation method.
Learn how to select cutoff distance and truncation method in GROMACS and NAMD.
Challenges in calculation of non bonded interactions.
- The number of non-bonded interactions increases as the square of the number of atoms.
- The most computationally demanding part of a molecular dynamics simulation is the calculation of the non-bonded interactions.
- Simulation can be significantly accelerated by limiting the number of evaluated non bonded interactions.
- Exclude pairs of atoms separated by long distance.
- Maintain a list of all particles within a predefined cutoff distance of each other.
Neighbour Searching Methods
- Divide simulation system into grid cells - cell lists.
- Compile a list of neighbors for each particle by searching all pairs - Verlet lists.
Cell Lists
- Divide the simulation domain into cells with edge length greater or equal to the cutoff distance.
- Interaction potential of a particle is the sum of the pairwise interactions with all other particles in the same cell and all neighboring cells.
Verlet Lists
- Verlet list stores all particles within the cutoff distance plus some extra buffer distance.
- All pairwise distances must be evaluated.
- List is valid until any particle has moved more than half of the buffer distance.
- Efficient computation of pairwise interactions.
- Relatively large memory requirement.
- In practice, almost all simulations use a combination of spatial decomposition and Verlet lists.
Problems with Truncation of Lennard-Jones Interactions and How to Avoid Them?
- LJ potential is always truncated at the cutoff distance.
- Truncation introduces a discontinuity in the potential energy.
- A sharp change in potential may result in nearly infinite forces.
Figure 1. The Distance Dependence of Potential and Force for Different Truncation Methods
Shifted potential | |
$\circ$ Shift the whole potential uniformly by adding a constant at values below cutoff. $\circ$ Avoids infinite forces. $\circ$ Does not change forces at the distances below cutoff. $\circ$ Introduces a discontinuity in the force at the cutoff distance. $\circ$ Modifies total potential energy. |
|
Shifted Force | |
$\circ$ Shift the whole force so that it vanishes at the cutoff distance. $\circ$ Modifies equations of motion at all distances. $\circ$ Better results at shorter cutoff values compared to the potential shift. |
|
Switching Function | |
$\circ$ Modify the shape of the potential function near cutoff. $\circ$ Forces are modified only near the cutoff boundary and they approach zero smoothly. |
How to Choose the Appropriate Cutoff Distance?
- A common practice is to truncate at 2.5 \(\sigma\).
- At this distance, the LJ potential is about 1/60 of the well depth \(\epsilon\).
- The choice of the cutoff distance depends on the force field and atom types.
For example for the O, N, C, S, and P atoms in the AMBER99 force field the values of \(\sigma\) are in the range 1.7-2.1, while for the Cs ions \(\sigma=3.4\). Thus the minimum acceptable cutoff, in this case, is 8.5.
- Increasing cutoff does not necessarily improve accuracy.
- Each force field has been developed using a certain cutoff value, and effects of the truncation were compensated by adjustment of some other parameters.
- To ensure consistency and reproducibility of simulation you should choose the cutoff appropriate for the force field:
Table 1. Cutoffs Used in Development of the Common Force Fields
AMBER | CHARMM | GROMOS | OPLS |
---|---|---|---|
8 Å, 10 Å (ff19SB) | 12 Å | 14 Å | 11-15 Å (depending on a molecule size) |
Properties that are very sensitive to the choice of cutoff
- the surface tension (Ahmed, 2010),
- the solid–liquid coexistence line (Grosfils, 2009),
- the lipid bi-layer properties (Huang, 2014),
- the structural properties of proteins (Piana, 2012).
For such quantities even a cutoff at 2.5 \(\sigma\) gives inaccurate results, and in some cases the cutoff larger than 6 \(\sigma\) was required for reliable simulations (Grosfils, 2009).
Effect of cutoff on energy conservation
- Short cutoff may lead to an increase in the temperature of the system over time.
- The best practice is to carry out trial simulations without temperature control to test it for energy leaks or sources before a production run.
Truncation of the Electrostatic Interactions
- The electrostatic interaction is divided into two parts: a short-range and a long-range.
- The short-range contribution is calculated by exact summation.
- The forces beyond the cutoff radius are approximated using Particle-Mesh Ewald (PME) method.
Challenge: Truncation of VDW Interactions
Which truncation method modifies VDW interactions only near the cut-off distance?
Solution
Switching functions
Selecting Neighbour Searching Methods
GROMACS
Neighbour searching is specified in the run parameter file mdp.
cutoff-scheme = group ; Generate a pair list for groups of atoms. Since version 5.1 group list has been deprecated and only Verlet scheme is available. cutoff-scheme = Verlet ; Generate a pair list with buffering. The buffer size is automatically set based on verlet-buffer-tolerance, unless this is set to -1, in which case rlist will is used. ; Neighbour search method. ns-type = grid ; Make a grid in the box and only check atoms in neighboring grid cells. ns-type = simple ; Loop over every atom in the box. nstlist = 5 ; Frequency to update the neighbour list. If set to 0 the neighbour list is constructed only once and never updated. The default value is 10.
NAMD
When run in parallel NAMD uses a combination of spatial decomposition into grid cells (patches) and Verlet lists with extended cutoff distance.
stepspercycle 10 # Number of timesteps in each cycle. Each cycle represents the number of timesteps between atom reassignments. Default value is 20. pairlistsPerCycle 2 # How many times per cycle to regenerate pairlists. Default value is 2.
Selecting LJ Potential Truncation Method
GROMACS
Truncation of LJ potential is specified in the run parameter file mdp.
vdw-modifier = potential-shift ; Shifts the VDW potential by a constant such that it is zero at the rvdw. vdw-modifier = force-switch ; Smoothly switches the forces to zero between rvdw-switch and rvdw. vdw-modifier = potential-switch ; Smoothly switches the potential to zero between rvdw-switch and rvdw. vdw-modifier = none rvdw-switch = 1.0 ; Where to start switching. rvdw = 1.2 ; Cut-off distance
NAMD
Truncation of LJ potential is specified in the run parameter file mdin.
cutoff 12.0 # Cut-off distance common to both electrostatic and van der Waals calculations switching on # Turn switching on/off. The default value is off. switchdist 10.0 # Where to start switching vdwForceSwitching on # Use force switching for VDW. The default value is off.
AMBER force fields
AMBER force fields are developed with hard truncation. Do not use switching or shifting with these force fields.
Selecting Cutoff Distance
GROMACS
Cutoff and neighbour searching is specified in the run parameter file mdp.
rlist = 1.0 ; Cutoff distance for the short-range neighbour list. Active when verlet-buffer-tolerance = -1, otherwise ignored. verlet-buffer-tolerance = 0.002 ; The maximum allowed error for pair interactions per particle caused by the Verlet buffer. To achieve the predefined tolerance the cutoff distance rlist is adjusted indirectly. To override this feature set the value to -1. The default value is 0.005 kJ/(mol ps).
NAMD
When run in parallel NAMD uses a combination of spatial decomposition into grid cells (patches) and Verlet lists with extended cutoff distance.
pairlistdist 14.0 # Distance between pairs for inclusion in pair lists. Should be bigger or equal than the cutoff. The default value is cutoff. cutoff 12.0 # Local interaction distance. Same for both electrostatic and VDW interactions.
Key Points
The calculation of non-bonded forces is the most computationally demanding part of a molecular dynamics simulation.
Non-bonded interactions are truncated to speed up simulations.
The cutoff distance should be appropriate for the force field and the size of the periodic box.
Advancing Simulation in Time
Overview
Teaching: 20 min
Exercises: 0 minQuestions
How is simulation time advanced in a molecular dynamics simulation?
What factors are limiting a simulation time step?
How to accelerate a simulation?
Objectives
Understand how simulation is advanced in time.
Learn how to choose time parameters and constraints for an efficient and accurate simulation.
Learn how to specify time parameters and constraints in GROMACS and NAMD.
Integration Algorithms
- To simulate evolution of the system in time we need to solve Newtonian equations of motions.
- The exact analytical solution is not feasible, the problem is solved numerically.
- The approach used to find a numerical approximation to the exact solution is called integration.
- The integration algorithm advances positions of all atoms by small time steps \(\delta{t}\).
- If the time step is small enough the trajectory will be reasonably accurate.
- A good integration algorithm for MD should be time-reversible and energy conserving.
The Euler Algorithm
- The simplest integration method
- Use $\boldsymbol{r}, \boldsymbol{v},\boldsymbol{a}$ at time $t$ to compute $\boldsymbol{r}(t+\delta{t})$ and $\boldsymbol{v}(t+\delta{t})$:
$\qquad\boldsymbol{r}(t+\delta{t})=\boldsymbol{r}(t)+\boldsymbol{v}(t)\delta{t}+\frac{1}{2}\boldsymbol{a}(t)\delta{t}^2$
$\qquad\boldsymbol{v}(t+\delta{t})=\boldsymbol{v}(t)+\frac{1}{2}\boldsymbol{a}(t)\delta{t}$
- Assumes that acceleration does not change during time step.
- In reality acceleration is a function of coordinates, it changes when atoms move.
Drawbacks:
- Not energy conserving
- Not reversible in time
Applications
- Not recommended for classical MD
- Can be used to integrate some other equations of motion. For example, GROMACS offers a Euler integrator for Brownian (position Langevin) dynamics.
The original Verlet Algorithm
Verlet improved the Euler integration by using positions at two successive time steps. Using positions from two time steps ensured that acceleration changes were taken into account. Essentially, this algorithm is as follows: calculate the next positions using the current positions, forces, and previous positions:
$\qquad\boldsymbol{r}(t+\delta{t})=2\boldsymbol{r}(t)-\boldsymbol{r}(t-\delta{t})+\boldsymbol{a}(t)\delta{t}^2$
- The Verlet algorithm (Verlet, 1967) requires positions at two time steps. It is inconvenient when starting a simulation when only current positions are available.
While velocities are not needed to compute trajectories, they are useful for calculating observables e.g. the kinetic energy. The velocities can only be computed once the next positions are calculated:
$\qquad\boldsymbol{v}(t+\delta{t})=\frac{r{(t+\delta{t})-r(t-\delta{t})}}{2\delta{t}}$
The Verlet algorithm is time-reversible and energy conserving.
The Velocity Verlet Algorithm
Euler integrator can be improved by introducing evaluation of the acceleration at the next time step. You may recall that acceleration is a function of atomic coordinates and is determined completely by interaction potential.
- The velocities, positions and forces are calculated at the same time using the following algorithm:
- Use $\boldsymbol{r}, \boldsymbol{v},\boldsymbol{a}$ at time $t$ to compute $\boldsymbol{r}(t+\delta{t})$: $\qquad\boldsymbol{r}(t+\delta{t})=\boldsymbol{r}(t)+\boldsymbol{v}(t)\delta{t}+\frac{1}{2}\boldsymbol{a}(t)\delta{t}^2$
- Derive $ \boldsymbol{a}(t+\delta{t})$ from the interaction potential using new positions $\boldsymbol{r}(t+\delta{t})$
- Use both $\boldsymbol{a}(t)$ and $\boldsymbol{a}(t+\delta{t})$ to compute $\boldsymbol{v}(t+\delta{t})$: $\quad\boldsymbol{v}(t+\delta{t})=\boldsymbol{v}(t)+\frac{1}{2}(\boldsymbol{a}(t)+\boldsymbol{a}(t+\delta{t}))\delta{t} $
- The Verlet algorithm is time-reversible and energy conserving.
The Velocity Verlet algorithm is mathematically equivalent to the original Verlet algorithm. It explicitly incorporates velocity, solving the problem of the first time step in the basic Verlet algorithm.
- Due to its simplicity and stability the Velocity Verlet has become the most widely used algorithm in the MD simulations.
Leap Frog Variant of Velocity Verlet
- The leap frog algorithm is a modified version of the Verlet algorithm.
- The only difference is that the velocities are not calculated at the same time as positions.
- Positions and velocities are computed at interleaved time points, staggered in such a way that they “leapfrog” over each other.
Velocity, position, and forces are calculated using the following algorithm:
- Derive $ \boldsymbol{a}(t)$ from the interaction potential using positions $\boldsymbol{r}(t)$
- Use $\boldsymbol{v}(t-\frac{\delta{t}}{2})$ and $\boldsymbol{a}(t)$ to compute $\boldsymbol{v}(t+\frac{\delta{t}}{2})$: $\qquad\boldsymbol{v}(t+\frac{\delta{t}}{2})=\boldsymbol{v}(t-\frac{\delta{t}}{2}) + \boldsymbol{a}(t)\delta{t}$
- Use current $\boldsymbol{r}(t)$ and $\boldsymbol{v}(t+\frac{\delta{t}}{2})$ to compute $\boldsymbol{r}(t+\delta{t})$ : $\qquad\boldsymbol{r}(t+\delta{t})=\boldsymbol{r}(t)+\boldsymbol{v}(t+\frac{\delta{t}}{2})\delta{t}$
- The Leap Frog and the Velocity Verlet integrators give equivalent trajectories.
- Restart files are different
Discontinuities in simulations occur when the integrator is changed
Since velocities in Leap-frog restart files are shifted by half a time step from coordinates, changing the integrator to Verlet will introduce some discontinuity.
Integrator pairs of coordinates ($r$) and velocities ($v$) Velocity-Verlet $[r_{t=0}, v_{t=0}], [r_{t=1}, v_{t=1}], [r_{t=2}, v_{t=2}], \ldots $ Leap-Frog $[r_{t=0}, v_{t=0.5}], [r_{t=1}, v_{t=1.5}], [r_{t=2}, v_{t=2.5}], \ldots $ A simulation system will experience an instantaneous change in kinetic energy. That makes it impossible to transition seamlessly between integrators.
Selecting the Integrator
GROMACS
Several integration algorithms available in GROMACS are specified in the run parameter mdp file.
integrator = md ; A leap frog algorithm integrator = md-vv ; A velocity Verlet algorithm integrator = md-vv-avek ; A velocity Verlet algorithm same as md-vv except the kinetic energy is calculated as the average of the two half step kinetic energies. More accurate than the md-vv. integrator = sd ; An accurate leap frog stochastic dynamics integrator. integrator = bd ; A Euler integrator for Brownian or position Langevin dynamics.
NAMD
The only available integration method is Verlet.
How to Choose Simulation Time Step?
Larger time step allows to run simulation faster, but accuracy decreases.
- Verlet family integrators are stable for time steps: \(\delta{t}\leq\frac{1}{\pi{f}}\) where \(f\) is oscillation frequency.
- Vibrations of bonds with hydrogens have period of 10 fs
- Bond vibrations involving heavy atoms and angles involving hydrogen atoms have period of 20 fs
- Stretching of bonds with the lightest atom H is the fastest motion.
- As period of oscillation of a C-H bond is about 10 fs, Verlet integration is stable for time steps < 3.2 fs.
- In practice, the time step of 1 fs is recommended to describe this motion reliably.
- Simulation step can be doubled by constraining bonds with hydrogens.
- Further increase of the simulation step requires constraining bonds between all atoms and angles involving hydrogen atoms.
Other ways to increase simulation speed
- Compute long range electrostatic interactions less often than the short range interactions.
- Employ an intermediate timestep for the short-range non-bonded interactions, performing only bonded interactions at each timestep.
- Hydrogen mass repartitioning allows increasing time step to 4 fs.
Specifying Time Parameters
GROMACS
Time parameters are specified in the mdp run parameter file.
dt = 0.001 ; Time step, ps nsteps = 10000 ; Number of steps to simulate tinit = 0 ; Time of the first step
NAMD
Time parameters are specified in the mdin run parameter file.
TimeStep = 1 # Time step, fs NumSteps = 10000 # Number of steps to simulate FirstTimeStep = 0 # Time of the first step # Multiple time stepping parameters nonbondedFreq 2 # Number of timesteps between short-range non-bonded evaluation. fullElectFrequency 4 # Number of timesteps between full electrostatic evaluations
Constraint Algorithms
- To constrain bond length in a simulation the equations of motion must be modified.
- The goal is to constrain some bonds without affecting dynamics and energetics of a system.
- One way to constrain bonds is to apply constraint force acting along a bond in opposite direction.
In constrained simulation first the unconstrained step is done, then corrections are applied to satisfy constraints.
- As bonds in molecules are coupled satisfying all constraints in a molecule becomes increasingly complex for larger molecules.
- Several algorithms have been developed for use specifically with small or large molecules.
SETTLE
- Very fast analytical solution for small molecules.
- Widely used to constrain bonds in water molecules.
SHAKE
- Iterative algorithm that resets all bonds to the constrained values sequentially until the desired tolerance is achieved.
- Simple and stable, it can be applied for large molecules.
- Works with both bond and angle constraints.
- Slower than SETTLE and hard to parallelize.
- SHAKE may fail to find the constrained positions when displacements are large.
Extensions of the original SHAKE algorithm: RATTLE, QSHAKE, WIGGLE, MSHAKE, P-SHAKE.
LINCS
- Linear constraint solver
- 3-4 times faster than SHAKE and easy to parallelize.
- The parallel LINCS (P-LINCS) allows to constrain all bonds in large molecules.
- Not suitable for constraining both bonds and angles.
Choosing simulation time step
Which of the following methods allows for a time step of 4 fs?
- Constraining bonds involving hydrogen atoms
- Hydrogen mass repartitioning
- Constraining bonds and angles involving hydrogen atoms
- Constraining angles between heavy atoms
Solution
Hydrogen mass repartitioning
Specifying Constraints
GROMACS
SHAKE, LINCS and SETTLE constraint algorithms are implemented. They are selected via keywords in mdp input files
constraints = h-bonds ; Constrain bonds with hydrogen atoms constraints = all-bonds ; Constrain all bonds constraints = h-angles ; Constrain all bonds and additionally the angles that involve hydrogen atoms constraints = all-angles ; Constrain all bonds and angles constraint-algorithm = LINCS ; Use LINCS constraint-algorithm = SHAKE ; Use SHAKE shake-tol = 0.0001 ; Relative tolerance for SHAKE, default value is 0.0001.
SETTLE can be selected in the topology file:
[ settles ] ; OW funct doh dhh 1 1 0.1 0.16333
NAMD
SHAKE and SETTLE constraint algorithms are implemented. They are selected via keywords in simulation input file.
rigidBonds water # Use SHAKE to constrain bonds with hydrogens in water molecules. rigidBonds all # Use SHAKE to constrain bonds with hydrogens in all molecules. rigidBonds none # Do not constrain bonds. This is the default. rigidTolerance 1.0e-8 # Stop iterations when all constrained bonds differ from the nominal bond length by less than this amount. Default value is 1.0e-8. rigidIterations 100 # The maximum number of iterations. If the bond lengths do not converge, a warning message is emitted. Default value is 100. rigidDieOnError on # Exit and report an error if rigidTolerance is not achieved after rigidIterations. The default value is on. useSettle on # If rigidBonds are enabled then use the SETTLE algorithm to constrain waters. The default value is on.
Key Points
A good integration algorithm for MD should be time-reversible and energy conserving.
The most widely used integration method is the velocity Verlet.
Simulation time step must be short enough to describe the fastest motion.
Time step can be increased if bonds involving hydrogens are constrained.
Additional time step increase can be achieved by constraining all bonds and angles involving hydrogens.
Periodic Boundary Conditions
Overview
Teaching: 20 min
Exercises: 0 minQuestions
How to simulate a bulk (gas, liquid or solid) system by using only a small part?
Objectives
Understand why and when periodic boundary conditions are used.
Understand how shape and size of a periodic box can affect simulation.
Learn how to set periodic box parameters in GROMACS and NAMD.
What is PBC and why is it important?
- In most cases we want to simulate a system in realistic environment, such as solution.
- Try simulating a droplet of water, it will simply evaporate.
- We need a boundary to contain water and control temperature, pressure, and density.
- Periodic boundary conditions allow to approximate an infinite system by using a small part (unit cell).
- Unit cell is surrounded by an infinite number of translated copies in all directions (images).
- When a particle in unit cell moves across the boundary it reappears on the opposite side.
- Each molecule always interacts with its neighbors even though they may be on opposite sides of the simulation box.
- Artifacts caused by the interaction of the isolated system with a vacuum are replaced with the PBC artifacts which are in general much less severe.
Choosing periodic box size and shape.
Box shape
Cubic periodic box
- A cubic box is the most intuitive and common choice
- Cubic box is inefficient due to irrelevant water molecules in the corners.
- Ideal simulation system is a sphere of water surrounding the macromolecule, but spheres can’t be packed to fill space.
Octahedral and dodecahedral periodic boxes
- The dodecahedron (12 faces) or the truncated octahedron (14 faces) are closer to sphere.
Space filling with truncated octahedrons | |
---|---|
- These shapes work reasonably well for globular macromolecules.
Triclinic periodic boxes
- Any repeating shape that fills all of space has an equivalent triclinic unit cell.
- A periodic box of any shape can be represented by a triclinic box with specific box vectors and angles.
- The optimal triclinic cell has about 71% the volume of the optimal rectangular cell.
Box size
- The minimum box size should extend at least 10 nm from the solute.
- The shortest periodic box vector should be at least twice bigger than the cuf-off radius.
- In simulations with macromolecules solvent molecules should not “feel” both sides of a solute.
Pitfalls
- A simulation system with elongated solute in cubic or dodecahedral box will have a large amount of water located far away from the solute.
- Consider using a narrow rectangular box.
- Rotation of elongated macromolecules and/or conformational changes must be taken in consideration.
- Constrain the rotational motion.
- The box shape itself may influence conformational dynamics by restricting motions in certain directions [1], [2].
Periodic box
Which of the statements is incorrect?
- The smallest dimension of a periodic box must be at least as large as the cut-off radius
- The minimum box size should extend at least 10 nm from the solute
- The smallest dimension of a periodic box must be at least as large as the double cut-off radius
- There is an equivalent triclinic unit cell for any repeating shape that occupies all of space
Solution
The smallest dimension of a periodic box must be at least as large as the cut-off radius
Specifying Periodic Box
GROMACS
The box specification is integrated into structure files. The box parameters can be set using the editconf program or manually. The editconf program accepts the following options:
-bt Box type triclinic, cubic, dodecahedron, octahedron -box Box vectors lengths, a, b, c nm -angles Box vectors angles, bc, ac, ab degrees -d Distance between the solute and the box nm Example:
module load StdEnv/2020 gcc gromacs wget http://files.rcsb.org/view/1lyz.pdb gmx pdb2gmx -f 1lyz.pdb -ff amber99sb-ildn -water spce -ignh gmx editconf -f conf.gro -o conf_boxed.gro -d 1.0 -bt cubic
In the example above the editconf program will append box vectors to the structure file ‘conf.gro’ and save it in the file ‘conf_boxed.gro’. The 9 components of the three box vectors are saved in the last line of the structure file in the order: xx yy zz xy xz yx yz zx zy. Three of the values (xy, xz, and yz) are always zeros because they are duplicates of (yx, zx, and zy). The values of the box vectors components are related to the unit cell vectors \(a,b,c,\alpha,\beta,\gamma\) from the CRYST1 record of a PDB file with the equations:
\[xx=a, yy=b\cdot\sin(\gamma), zz=\frac{v}{(a*b*\sin(\gamma))}\] \[xy=0, xz=0, yx=b\cdot\cos(\gamma)\] \[yz=0, zx=c\cdot\cos(\beta), zy=\frac{c}{\sin(\gamma)}\cdot(cos(\alpha)-cos(\beta)\cdot\cos(\gamma))\] \[v=\sqrt{1-\cos^2(\alpha)-cos^2(\beta)-\cos^2(\gamma) +2.0\cdot\cos(\alpha)\cdot\cos(\beta)\cdot\cos(\gamma)}\cdot{a}\cdot{b}\cdot{c}\]NAMD
Periodic box is specified in the run parameter file by three unit cell vectors, the units are Å.
# cubic box cellBasisVector1 100 0 0 cellBasisVector2 0 100 0 cellBasisVector3 0 0 100
Alternatively periodic box parameters can be read from the .xsc (eXtended System Configuration) file by using the extendedSystem keyword. If this keyword is used cellBasisVectors are ignored. NAMD always generates .xsc files at runtime.
extendedSystem restart.xsc
Comparing periodic boxes
Using the structure file ‘conf.gro’ from the example above generate triclinic, cubic, dodecahedral and truncated octahedral boxes with the 15 Å distance between the solute and the box edge.
Which of the boxes will be the fastest to simulate?
References:
- Molecular dynamics simulations with constrained roto-translational motions: Theoretical basis and statistical mechanical consistency
- The effect of box shape on the dynamic properties of proteins simulated under periodic boundary conditions
- Periodic box types in Gromacs manual
Key Points
Periodic boundary conditions are used to approximate an infinitely large system.
Periodic box should not restrict molecular motions in any way.
The macromolecule shape, rotation and conformational changes should be taken into account in choosing the periodic box parameters.
Degrees of Freedom
Overview
Teaching: 10 min
Exercises: 0 minQuestions
How is kinetic energy contained and distributed in a dynamic molecular system
How to increase efficiency of thermodynamic sampling
How reduction of degrees of freedom affects dynamics
Objectives
Explain how kinetic energy is distributed in a simulation system
Understand how to correctly increase efficiency of thermodynamic sampling
Molecular degrees of freedom
- Refer to the number of unique ways a molecule may move.
- Molecular degrees of freedom describe how kinetic energy is contained and distributed in a molecule.
- Translational, rotational, and vibrational components.
Equipartition theorem
- The energy is shared equally among the degrees of freedom (equipartition theorem).
- Each degree of freedom has an average energy of \(\frac{1}{2}k_BT\) and contributes \(\frac{1}{2}k_B\) to the system’s heat capacity.
Degrees of freedom and thermodynamics properties
- When kinetic energy is applied to simulation systems containing molecules with different degrees of freedom, the temperature increase will vary.
- If energy is spread over many places (degrees of freedom), the temperature will change less.
- The number of degrees of freedom is an essential quantity for estimating various thermodynamic variables for a simulation system (such as heat capacity, entropy, and temperature).
Translational degrees of freedom
- Atoms and molecules have three degrees of freedom associated with the translation of their centers of mass about each coordinate axis.
- Translational degrees of freedom in three dimensions yield \(\frac{3}{2}k_BT\) of energy.
Rotational degrees of freedom
- Atoms have a negligible amount of rotational energy because their mass is concentrated in the nucleus which is very small (about \(10^{-15}\) m).
- A linear molecule, has two rotational degrees of freedom.
- A nonlinear molecule, where the atoms do not lie along a single axis, has three rotational degrees of freedom, because it can rotate around any of three perpendicular axes.
- The rotational degrees of freedom contribute \(k_BT\) to the energy of linear molecules and \(\frac{3}{2}k_BT\) to the energy of non-linear molecules.
Vibrational degrees of freedom
- A diatomic molecule has one molecular vibration mode.
- A linear molecule with N atoms has 3N − 5 vibrational modes.
- A non-linear molecule with N atoms has 3N − 6 vibrational modes.
Angle bend | Symmetric stretch | Asymmetric stretch |
---|---|---|
- There are two degrees of freedom for each vibrational mode.
- One degree involves the kinetic energy of the moving atoms, and the other involves the potential energy of the springlike chemical bond(s).
- Each vibrational degree of freedom contributes \(k_BT\) to the energy of a molecule. However, this is valid only when \(k_BT\) is much bigger than energy spacing between vibrational modes.
- At low temperature this condition is not satisfied, only a few vibrational states are occupied and the equipartition principle is not typically applicable.
Increasing efficiency of thermodynamic sampling.
By reducing the number of degrees of freedom we can increase thermodynamic sampling efficiency.
- Force fields remove the electrons’ degrees of freedom by replacing them with atom centered charges.
- An implicit solvent model eliminates the degrees of freedom associated with the solvent molecules.
- Bond constraints eliminate vibrational degrees of freedom and make possible to use longer time steps.
- Constraints including angles and dihedrals can be also applied.
- Coarse-grained models reduce degrees of freedom by averaging interactions across groups of particles.
Reduction of the number of degrees of freedom may lead to artifacts.
Bond and angle constraints can:
- slow down dihedral angle transitions [1]
- shift the frequencies of the normal modes in biomolecules [2]
- perturb the dynamics of polypeptides [3].
Coarse-grained models: - Produce approximate dynamics by sacrificing atomic details.
Key Points
Constraints decrease the number of degrees of freedom
Imposing constraints can affect simulation outcome
Electrostatic Interactions
Overview
Teaching: 10 min
Exercises: 0 minQuestions
How electrostatic interactions are calculated in periodic systems?
Objectives
Learn what parameters control the accuracy of electrostatic calculations
Coulomb interactions
- Long range: $V_{Elec}=\frac{q_{i}q_{j}}{4\pi\epsilon_{0}\epsilon_{r}r_{ij}}$
- Computing Coulomb potentials is often the most time consuming part of any MD simulation.
- Fast and efficient algorithms are required for these calculations.
Particle Mesh Ewald (PME)
- The most widely used method using the Ewald decomposition.
- The potential is decomposed into two parts: fast decaying and slow decaying.
Fast decaying short-ranged potential (Particle part).
- Sum of all pairwise Coulomb interactions within a cutoff radius
- Implements same truncation scheme as the LJ potentials.
Slow decaying long-ranged potential (Mesh part).
- Slowly varying, smooth and periodic function.
- All periodic functions can be represented with a sum of sine or cosine components.
- Slowly varying functions can be accurately described by only a limited number of low frequency components (k vectors).
PME algorithm
- Long-range electrostatic interactions are evaluated using 3-D grids in reciprocal Fourier space.
- Assign charges to grid cells. Charges in grid cells are obtained by interpolation.
- Compute Fourier transform.
- Compute potential. Coulomb interaction decays rapidly in Fourier space, and summation converges fast.
- Compute inverse Fourier transform.
- Interpolate gridded potentials back to atomic centers.
Simulation parameters controlling speed and accuracy of PME calculations.
- Grid spacing. Lower values lead to higher accuracy but considerably slow down the calculation.
- Grid dimension. Higher values lead to higher accuracy but considerably slow down the calculation.
- Direct space tolerance. Controls the splitting into direct and reciprocal part. Higher tolerance shifts more charges into Fourier space.
- Interpolation order is the order of the B-spline interpolation. The higher the order, the better the accuracy.
Challenge: Electrostatic interactions
Which of the following statements is incorrect?
For more accurate electrostatic calculations you need to:
- Decrease grid spacing
- Increase the grid dimensions
- Increase direct space tolerance
- Increase the interpolation order
Solution
Increase direct space tolerance
PME variables
Variable \ MD package GROMACS NAMD AMBER Fourier grid spacing fourierspacing (1.2) PMEGridSpacing (1.5) Grid Dimension [X,Y,Z] fourier-[nx,ny,nz] PMEGridSize[X,Y,Z] nfft[1,2,3] Direct space tolerance ewald-rtol (\(10^{-5}\)) PMETolerance (\(10^{-5}\)) dsum_tol (\(10^{-6}\)) Interpolation order pme-order (4) PMEInterpOrder (4) order (4)
Key Points
Calculation of electrostatic potentials is the most time consuming part of any MD simulation
Long-range part of electrostatic interactions is calculated by approximating Coulomb potentials on a grid
Denser grid increases accuracy, but significantly slows down simulation
Controlling Temperature
Overview
Teaching: 20 min
Exercises: 0 minQuestions
What is temperature on the molecular level?
Why do we want to control the temperature of our MD-simulations?
What temperature control algorithms are commonly used?
What are the strengths and weaknesses of these common thermostats?
Objectives
Remind us how Maxwell-Boltzmann distributions relate to temperature.
Quickly review thermodynamic ensembles.
Learn about different thermostats and get an idea how they work.
Learn where thermostats that don’t produce correct thermodynamic ensembles can still be very useful.
Temperature at the molecular level.
- Temperature is defined by the average kinetic energy of all the particles.
- A system in equilibrium, will have all of its energy distributed in the most probable way.
- Particles in a system at equilibrium don’t all have the same velocity.
- Velocities follow a distribution that depends on their mass and the temperature of the system:
Maxwell-Boltzmann distributions
Krypton at different temperatures | Noble gases with different mass at 298K |
Velocity distributions obtained from MD simulations of water at different temperature.
Thermodynamic ensembles
MD simulations are usually simulate one of the following thermodynamic ensembles:
- The microcanonical or constant NVE ensemble.
- The canonical or constant NVT ensemble.
- The isothermal-isobaric or constant NPT ensemble.
- Simulation of NVE ensemble is relatively easy to achieve, as long as the MD code manages to conserve the energy of the system.
- The NVT ensemble is more practically relevant, as in the real world it is much easier to manage the temperature of a system rather than it’s energy.
- Need to use a temperature control algorithm, a thermostat.
- In many cases constant ambient pressure and temperature are desired. In addition to using a thermostat to control the temperature, we also need to use a pressure control algorithm, a barostat.
Temperature Control Algorithms
- Allow energy to enter and leave the simulated system to keep its temperature constant.
- In practice thermostats do that by adjusting the velocities of a subset of particles.
- The methods of maintaining temperature fall into four categories:
- Strong coupling methods
- Weak coupling methods
- Stochastic methods
- Extended system dynamics
1. Strong coupling methods
Velocity rescaling
- Rescale the velocities at each step (or after a preset number of steps) to get the desired target temperature.
Velocity reassignment
-
Periodically assign new randomized velocities so that the entire system is set to the desired temperature.
- Both methods do not generate the correct canonical ensemble.
- Not recommended for equilibrium dynamics
- Useful for heating or cooling.
Downsides:
- Rescaling will make hot spots even hotter.
- Temperature reassignment avoids this problem, but the kinetic energy of particles is no longer consistent with their potential energy, and thus needs to be redistributed.
2. Weak coupling methods
Berendsen thermostat
- Rescale the velocities of all particles to remove a fraction of the difference from the predefined temperature.
- The rate of temperature equilibration is controlled by strength of the coupling.
- The Berendsen thermostat a predictably converging and robust thermostat.
- Very useful when allowing the system to relax.
Downsides:
- Cannot be mapped onto a specific thermodynamic ensemble.
- Produces an energy distribution with a lower variance than of a true canonical ensemble [Basconi-2013 and Shirts-2013].
- Should be avoided for production MD simulations.
Heat flows between the simulation system and the heat bath with the rate defined by a time constant \(\tau_T\)
3. Stochastic methods
Randomly assign a subset of atoms new velocities based on Maxwell-Boltzmann distributions for the target temperature. Randomization interferes with correlated motion and thus slows down the system’s kinetics.
Andersen thermostat
- Assign a subset of atoms new velocities that are randomly selected from the Maxwell-Boltzmann distribution for the target temperature [Andersen-1980].
- “massive Andersen” thermostat randomizes the velocities of all atoms [Basconi-2013].
The Andersen thermostat:
- Correctly samples the canonical ensemble
- Does not conserve momentum.
- Can impair correlated motions and thus slow down the kinetics of the system.
- Not recommended when studying kinetics or diffusion properties of the system.
The Lowe-Andersen thermostat
- A variant of the Andersen thermostat that conserves momentum [Koopman-2006].
- Perturbs the system dynamics to a far less than the original Andersen method.
- Improves suppressed diffusion in the system relative to the original Andersen.
Bussi stochastic velocity rescaling thermostat
- Extension of the Berendsen method corrected for sampling the canonical distribution.
- The velocities of all the particles are rescaled by a properly chosen random factor [Bussi-2007].
Langevin thermostat
- Mimics the viscous aspect of a solvent and interaction with the environment.
- Adds a frictional force and a random force.
- The frictional force and the random force combine to give the correct canonical ensemble.
- The amount of friction is controlled by the damping coefficient.
4. Extended system thermostats
Nosé-Hoover thermostat
- The heat bath is integrated with the system by addition of an artificial variable associated with a fictional “heat bath mass” to the equations of motion.
- The temperature can be controlled without involving random numbers.
- Correlated motions are not impaired
- Better description of kinetics and diffusion properties Nose-1984, Hoover-1985.
Drawbacks:
- Periodic temperature fluctuations with the frequency proportional to the “heat bath mass”.
- Imparts the canonical distribution as well as ergodicity (space-filling).
The time constant parameter in this thermostat controls the period of temperature fluctuations at equilibrium.
Nosé-Hoover-chains
- A modification of the Nosé-Hoover thermostat which includes not a single thermostat variable but a chain of variables with different “masses” Martyna-1992.
- Chaining variables with different masses helps to suppress oscillations.
Global and local thermostats
- Global thermostats control temperature of all atom in a system uniformly.
-
This may lead to cold solute and hot solvent due to a slow heat transfer.
- Local thermostats allow to control temperature in selected groups of atoms independently.
- Local thermostats work well for large solutes.
- Temperature of small solutes this approach may significantly fluctuate leading to unrealistic dynamics.
Challenge: Thermodynamic ensembles
What thermodynamic ensemble describes an isolated system?
- Canonical
- Grand canonical
- Isothermal-isobaric
- Microcanonical
Solution
Microcanonical
Challenge: Thermostats
Which of the following statements is incorrect?
- Stochastic temperature control methods impair conformational transitions
- The Berendsen thermostat is very useful for heating simulation systems
- Extended system thermostats control temperature without random velocity rescaling
- Local thermostats work well for small groups of atoms
Solution
Local thermostats work well for small groups of atoms
Specifying local thermostats
With NAMD it is possible to set coupling coefficients for each atom in occupancy or beta column of a pdb file:
langevinFile
langevinColtCoupleFile
tCoupleFColIn GROMACS temperature of a selected groups of atoms can be controlled independently using tc-grps. Temperature coupling groups are coupled separately to temperature bath.
Selecting thermostats in molecular dynamics packages
Thermostat/MD package GROMACS NAMD AMBER velocity rescaling reascaleFreq (steps) velocity reassignment reassignFreq (steps) Andersen tcoupl = andersen massive-Andersen tcoupl = andersen-massive ntt = 2 Lowe-Andersen loweAndersen on Berendsen tcoupl = berendsen tCouple on ntt = 1 Langevin langevin on ntt = 3 Bussi tcoupl = V-rescale stochRescale on Nose-Hoover tcoupl = nose-hoover Nose-Hoover-chains nh-chain-length (default 10)
Key Points
On the molecular level, temperature manifests itself as a number of particles having a certain average kinetic energy.
Some temperature control algorithms (e.g. the Berendsen thermostat) fail to produce kinetic energy distributions that represent a correct thermodynamic ensemble.
Other thermostats, like Nosé-Hoover, produce correct thermodynamic ensembles but can take long to converge.
Even though the the Berendsen thermostat fails to produce correct thermodynamic ensembles, it can be useful for system relaxation as it is robust and converges fast.
Controlling Pressure
Overview
Teaching: 10 min
Exercises: 0 minQuestions
Why do we want to control the pressure of our MD-simulations?
What pressure control algorithms are commonly used?
What are the strengths and weaknesses of these common barostats?
Objectives
Identify the appropriate method for controlling pressure
Introduction
The role of pressure control algorithms is to keep pressure in the simulation system constant or to apply an external stress to the simulated system.
- Pressure is kept on its target value by adjusting the volume of a periodic simulation system.
- Pressure is a force exerted by collision of particles with the walls of a closed container.
- The virial equation is used to obtain the pressure:
$ \qquad {P}=\frac{NK_{B}T}{V}+\frac{1}{3V}\langle\sum{r_{ij}F_{ij}}\rangle$
- The first term in this equation describes pressure of an ideal gas (no interaction between molecules).
- The second contribution comes from internal forces acting on each atom.
- Well suited for MD because forces are evaluated at each simulation step.
Pressure Control Algorithms
- Regulate pressure by adjusting the volume
- In practice barostats do that by scaling coordinates of each atom by a small factor.
- The methods of maintaining pressure fall into four categories:
- Weak coupling methods
- Extended system methods
- Stochastic methods
- Monte-Carlo methods
1. Weak coupling methods
Berendsen pressure bath coupling.
- Conceptually similar to Berendsen thermostat.
- Available in all simulation packages.
- Change the volume by an increment proportional to the difference between the internal pressure and pressure in a weakly coupled bath.
- Very efficient in equilibrating the system.
Downsides:
- Does not sample the exact NPT statistical ensemble.
- Induces artifacts into simulations of inhomogeneous systems such as aqueous biopolymers or liquid/liquid interfaces.
- Should be avoided for production MD simulations.
The time constant for pressure bath coupling is the main parameter of the Berendsen thermostat. The pressure of the system is corrected such that the deviation exponentially decays with a lifetime defined by this constant.
Reference: Molecular dynamics with coupling to an external bath
2. Extended system methods
- Extended system methods originate from the classical theoretical work of Andersen.
- He included an additional degree of freedom, the volume of a simulation cell.
- Volume adjusts itself to equalize the internal and external pressure.
- Volume serves as a piston, and is given a fictitious “mass” controlling the decay time of pressure fluctuations.
- Extended system methods are time-reversible. They can be used to integrate backwards, for example, for transition path sampling.
Parrinello-Rahman barostat
- Extension of the Andersen method allowing changes in the shape of the simulation cell [Parrinello and Rahman, 1980].
- Further extended to include external stresses [Parrinello and Rahman, 1981].
- Useful to study structural transformations in solids under external stress.
- Equations of motion are similar to Nosé-Hoover barostat, and in most cases it is used with the Nosé-Hoover thermostat.
Downsides:
- Volume may oscillate with the frequency proportional to the piston mass.
Nosé-Hoover barostat
- First application of the method analogous to Andersen’s barostat for molecular simulation [Nosé and Klein, 1983].
- The Nosé-Hoover equations of motion are only correct in the limit of large systems.
References: [Hoover, 1986], [Martyna, 1994].
MTTK (Martyna-Tuckerman-Tobias-Klein) barostat.
- Extension of the Nosé-Hoover and Nosé-Hoover chain thermostat, performs better for small systems [Martyna et al., 1996].
3. Stochastic methods
Langevin piston pressure control.
- Based on Langevin thermostat.
- The equations of motion resemble MTTK equations.
- An additional damping (friction) force and stochastic force are introduced.
- Random collisions eliminate oscillation of the volume associated with the piston mass.
Reference: Constant pressure molecular dynamics simulation: The Langevin piston method
MTTK and Langevin barostats produce identical ensembles | Langevin barostat oscillates less then MTTK and converges faster due to stochastic collisions and damping. |
Reprinted with permission from Rogge et al. 2015, A Comparison of Barostats for the Mechanical Characterization of Metal−Organic Frameworks, J Chem Theory Comput. 2015;11: 5583-97. doi:10.1021/acs.jctc.5b00748. Copyright 2015 American Chemical Society.
Stochastic Cell Rescaling
- Improved version of the Berendsen barostat.
- Adds stochastic term to rescaling matrix.
- Produces correct fluctuations of local pressure for NPT ensemble.
- Pressure converges fast without oscillations.
- Can be used for all stages of MD, including production.
Reference: [Bernetti and Bussi (2020)]
4. Monte-Carlo pressure control.
- Recently several efficient Monte Carlo methods have been introduced.
- Sample volume fluctuations at a predefined number of steps at a given constant external pressure.
- Generate a random volume change, evaluate the potential energy. The volume move is then accepted with the standard Monte-Carlo probability.
- Do not compute virial, so pressure is not available at the runtime, and not printed in energy files.
References:
- Molecular dynamics simulations of water and biomolecules with a Monte Carlo constant pressure algorithm
- Constant pressure hybrid Molecular Dynamics–Monte Carlo simulations
Pitfalls
- If the difference between the target and the real pressure is large, the program will try to adjust the density too quickly.
- Rapid change of the system size may lead to simulation crash.
- To ensure stability of a simulation volume must be adjusted very slowly with a small likely
Selecting barostats in molecular dynamics packages
Thermostat\MD package GROMACS NAMD AMBER Berendsen pcoupl = Berendsen BerendsenPressure on barostat = 1 Stoch. cell rescaling pcoupl = C-rescale Langevin LangevinPiston on Monte-Carlo barostat = 2 Parrinello-Rahman pcoupl = Parrinello-Rahman MTTK pcoupl = MTTK
Key Points
Each barostat or thermostat technique has its own limitations and it is your responsibility to choose the most appropriate method or their combination for the problem of interest.
Water models
Overview
Teaching: 15 min
Exercises: 0 minQuestions
Why do we want to use water in out simulations?
What water models are commonly used?
How to chose a model of water for a simulation?
Objectives
Learn how water molecules are modeled in simulations
Understand how choice of water model can affect a simulation
Introduction
Realistic water environment is essential for accurate simulation of biological molecules.
Two approaches to solvation:
- Add explicit water molecules.
- Treat water as a continuous medium instead of individual molecules.
Continuum models
- Significantly faster than explicit solvation.
- Cannot reproduce the microscopic details of the protein–water interface.
- Do not produce same conformational ensembles as explicit water (salt bridges are over-stabilized, a higher-than-native alpha helix population).
- Useful for calculation of binding free energies or for flexible protein-protein docking.
Most molecular dynamics simulations are carried out with explicit water molecules.
Explicit models
- Typically water molecules account for over 80% of the particles in the simulation.
- Water–water interactions dominate the computational cost of such simulations.
-
The water model needs to be fast and accurate.
- Explicit water models are empirical models derived to reproduce some bulk properties in a particular phase.
- None of water models accurately reproduce all of the key properties of bulk water.
- Which water model is optimal for the simulation depends on the desired properties of interest.
An early water model.
- Rigid geometry closely matching actual water molecule.
- O-H and H-H distances are constrained with harmonic potential.
- Point charges replace electron density distribution.
How are water models derived?
A good water model must faithfully reproduce six bulk properties of water:
- Static dielectric constant, \(\epsilon_{0}\)
- Self diffusion coefficient, \(\vec{D}\)
- Heat of vaporization, \(\Delta{H}_{vap}\)
- Isobaric heat capacity, \(\vec{C}_{p}\)
- Thermal expansion coefficient, \(\alpha_{p}\)
- Isothermal compressibility, \(\kappa_{T}\)
Many water models with different level of complexity (the number of interaction points) have been developed. We will discuss only the models most widely used in simulations, and refer you to the excellent article in Wikipedia for an overview of all water models.
3-charge 3-point models.
- Three interaction points corresponding to the atoms of the water molecule.
- Only oxygen atom has Lennard-Jones parameters.
- 3-site models are commonly used because computationally they are highly efficient.
TIP3P (transferable intermolecular potential) $\circ$ Rigid geometry closely matching actual water molecule. |
|
SPC/E (simple point charge) $\circ$ More obtuse tetrahedral angle of 109.47°. $\circ$ Adds an average polarization correction to the potential energy function. $\circ$ Reproduces density and diffusion constant better than the original SPC model. |
3-charge 4-point models.
- The negative charge is not centered on the oxygen atom, but shifted towards hydrogen atoms
- The position of charge is represented with the fourth dummy atom (EP)
- EP is located near the oxygen along the bisector of the HOH angle.
TIP4P-Ew $\circ$ Improves association/dissociation balance compared to 3-point models. |
Water models have their limitations.
- Early water models were developed with cut-off of electrostatic interactions. Using these models with full electrostatic method results in stronger electrostatic interactions and consequently higher density.
- Most of the more complex new water models attempt to reproduce specific properties of a specific phase, but this comes at the expense of other properties.
- TIP3P model predicts hydration free energies of small neutral molecules more accurately than the TIP4PEw model.
- 4-charge 5-point model TIP5P predicts excellent water structure, but poor hydration energies.
Challenges in developing water models.
- Finding an accurate yet simplified description of the charge distribution that can adequately account for the hydrogen bonding in the liquid phase.
- Traditional approach is to place point charges on or near the nuclei.
- Electrostatic potential of water molecule is reproduced considerably more accurately with 3 point charges when they form tight cluster.
Optimal Point Charges (OPC and OPC3)
- Designed without geometrical restraints.
- Considerably better reproduces the six bulk properties of water.
Polarizable water model OPC3-pol
- Polarization is usually achieved by adding model oscillators, referred to as Drude particles.
- Simulations of classical Drude oscillator are computationally expensive (self-consistent procedure).
-
Drude water models require matching, polarizable force fields for biomolecules.
- OPC3-pol model treats the Drude particle as an “ordinary” atom in the molecular dynamics system
- Offers high computational efficiency
- Can be used with the existing biomolecular force fields
Quality of different water models
- The test models in which the moments were close to the QM values had low quality.
- The models that scored better had moments very different from the QM moments.
The distribution of quality scores for different water models in the space of dipole (μ) and quadruple (QT) moments. Figures from [2].
- Using only three point charges, is not sufficient to represent the charge distribution accurately.
Performance Considerations
- Computation cost is proportional to the number of pairwise distances.
- 3-charge 3-point model: 9 distances
- 3-charge 4-site model: 10 distances (3x3 Coulomb interactions plus one VDW O–O interaction).
Other things to consider
- Water models have traditionally only been parameterized for a single temperature of 298K (SPC/E, TIP3P, etc.)
Force Field Parameters of the common Water Models
TIP3P SPC/E TIP4P-Ew OPC OPC3 OH 0.9572 1.0 0.9572 0.8724 0.9789 HH 1.5136 1.63 1.5136 1.3712 1.5985 HOH 104.52 109.47 104.52 103.6 109.47 OM - - 0.125 0.1594 - A(12) 582.0 629.4 656.1 865.1 667.9 B(6) 595.0 625.5 653.5 858.1 qO −0.834 −0.8476 −1.04844 −1.3582 -0.8952 qH +0.417 +0.4238 +0.52422 +0.6791 +0.4476 Nonbonded OPC3: sigma=1.7814990, epsilon=0.163406
References
- Structure and Dynamics of the TIP3P, SPC, and SPC/E Water Models at 298 K
- Building Water Models: A Different Approach
- Effect of the Water Model in Simulations of Protein–Protein Recognition and Association
- Accuracy limit of rigid 3-point water models
- A fast polarizable water model for atomistic simulations
Key Points
Continuum models cannot reproduce the microscopic details of the protein–water interface
Water–water interactions dominate the computational cost of simulations
Good water model needs to be fast and accurate in reproduction of the bulk properties of water
End of Workshop
Overview
Teaching: min
Exercises: minQuestions
Objectives
Key Points
Supplemental: Overview of the Common Force Fields
Overview
Teaching: 0 min
Exercises: 0 minQuestions
What are the categories of molecular dynamics force fields?
What force fields are available?
What systems they work best with?
Objectives
Be able to recognize the strengths and weaknesses of different types of force fields.
Find out which force fields are available and which systems they are most suitable for.
Identify the original papers that introduced force fields.
- Early Force Fields
- Evolution of Force Fields
- Force Fields Aimed at Improving Quality of Molecular Interactions
- CVFF (Consistent Valence Force Field) (Maple & Hagler, 1988)
- CFF93 (An ab initio all-atom force field for polycarbonates) (Sun et al., 1994)
- CFF, formerly CFF95 (Jonsdottir & Rasmussen, 2000)
- MM3 (N.L.Allinger et al., 1989), MM4 (N.L.Allinger et al., 1996)
- COMPASS (Condensed-phase Optimized Molecular Potentials for Atomistic Simulation Studies)
- Biomolecular Force Fields for Large-Scale Simulations.
- Polarizable Force fields
- Force Fields for Small Molecules
- Additional Reading
Early Force Fields
CFF (Consistent Force Field) (1968)
CFF was the first modern force field (Lifson, 1968) Introduced a concept of a ‘consistent force field’. Introduced a methodology for deriving and validating force fields. The term ‘consistent’ emphasized importance of the ability to describe a wide range of compounds and physical observables (conformation, crystal structure, thermodynamic properties and vibrational spectra). After the initial derivation for hydrocarbons CFF was extended to proteins, but was very crude at that time.
Allinger Force Fields (MM1 - MM4) (1976-1996)
- MM1 (N.L. Allinger, 1976) - Class 1
- MM2 (N.L. Allinger, 1977) - Class 1
- MM3 N.L.Allinger et al., 1989 - Class 2
- MM4 (N.L.Allinger et al., 1996) - Class 3
Target data included electron diffraction, vibrational spectra, heats of formation, and crystal structures. The calculation results were verified via comparison with high-level ab-initio quantum chemistry computations, and the parameters were additionally adjusted. Intended for calculations on small and medium size organic molecules.
ECEPP (Empirical Conformational Energy Program for Peptides) (1975)
ECEPP was the first force field targeting polypeptides and proteins. Crystal data of small organic compounds and semi-empirical QM calculations were used extensively in derivation of this force field. As more experimental data became available, force field parameters have been refined and modified versions were published.
- ECEPP (F. A. Momany, et al., 1975)
- UNICEPP (L.G. Dunfield, 1978) The united atoms version of ECEPP, developed for the conformational analysis of large molecules
- ECEPP-02 (G. Nemethy et al, 1983)
- ECEPP-03 (G. Nemethy et al, 1992)
- ECEPP-05 (Y. A. Arnautova et al., 2006)
Evolution of Force Fields
Early force field advances
Early force field development focused on developing mathematical forms for MM energy function and methods of deriving parameters. Researchers investigated various forms of potential energy functions, and experimented with hydrogen bonding potential, combination rules, and out of plane angle potentials during this period.
United atoms force fields.
United Atoms Model was developed to speed up large-scale simulations. It represents nonpolar carbons and their bonded hydrogens as a single particle. United Atoms force fields can significantly reduce the size of most simulations, since roughly half of the atoms in biological or other organic macromolecules are hydrogens. Additional advantage is the efficiency gain in conformational sampling. The first united atoms force field was UNICEPP (L.G. Dunfield, 1978).
According to early comparisons between all-atom and united-atom simulations, united-atom force fields adequately represent molecular vibrations and bulk properties of small molecules.
After this initial success all major developers of protein force fields implemented united atoms models:
It became apparent, however, that there were some limitations:
- In the absence of explicit hydrogens, hydrogen bonds cannot be accurately treated;
- π-stacking cannot be represented without explicitly including hydrogens in aromatic groups;
- when hydrogens were combined with polar heavy atoms, dipole and quadrupole moments were found to be inaccurate.
New approaches were found to overcome the limitations of united-atom force fields. For example, only aliphatic hydrogens, which are not significantly charged and do not participate in hydrogen bonds, are represented as united atoms while other hydrogens are represented explicitly. In this way, the limitations of the united-atom force field are partially mitigated while preserving most of the benefits of the united-atom force field.
Refinement after the initial introduction.
Statistical errors caused by relatively short simulation lengths and systematic errors caused by inaccurate force fields limit the predictive power of MD simulations. Deficiencies of the force fields remained undetected when statistical errors caused by insufficient sampling prevailed. Increase of the computing power over last two decades allowed for much longer simulations and resulted in a significant reduction of statistical errors. This led to the detection of force field deficiencies such as large deviations in different observables and inability to predict conformations of proteins and peptides. Various approaches were undertaken to improve force fields such as:
-
Use an all-atom model to increase accuracy. Most force fields (CHARMM22, AMBER ff99 and GAFF, OPLS-AA, OPLS-AA/L) converted back to all-atom model.
-
Increase the size of the target data. Because each force field was derived with a different training set of atomic configurations, it was biased in one way or another. By using large atomic reference sets and careful selection of the atomic configurations, bias problems were reduced and accuracy was improved.
- Improve representation of static electrostatic potential. The atom-centered point charge model has two shortcomings:
- It is unable to describe the anisotropy of the charge distribution;
- It does not account for the charge penetration effect (deviation of electrostatic interaction from the Coulomb form due to electron shielding when atomic electron clouds overlap).
In molecular complexes, these effects determine equilibrium geometry and energy. Examples of anisotropic charge distribution are σ-holes, lone pairs, and aromatic systems.
In covalent bonding an atom’s side opposite its bond usually has a lower electron density region known as σ-hole. Through a positive electrostatic potential associated with a sigma-hole, an atom can interact attractively with negative sites.
A simple solution to the σ-hole model is to attach an off-centered positive charge to the halogen atom ( W. Jorgensen & P. Schyman, 2012, F.Lin, A. MacKerell, 2018). An atomic multipole method provides a more thorough way of describing anisotropic charge distribution (J.Ponder et al., 2010).
- Include polarization effects. Early force fields employed fixed atomic charges to model the electrostatic interactions. Fixed-charge electrostatics does not account for the many-body polarization that can vary significantly depending on chemical and physical environments. Consequently, non-polarizable force fields fail to capture the conformational dependence of electrostatic properties. Polarizable force fields where the charges can be calculated from the energy equilibration have been developed ot address this problem (see a recent review). A drawback of including polarization is that simulations take longer to run due to the high computational cost.
Recent developments in force fields and future prospects
Despite extensive attempts to improve force files, they have often failed to achieve quantitative accuracy. There was a realization that the functional form of potential energy was the major problem with all widely used protein force fields.
Two strategies can be used to address this problem:
- Expand and improve the rigor of the representation of the underlying physics.
- Develop empirical corrections to compensate for deficiency of physical representation
AMBER, CHARMM, and OPLS focused their efforts on empirical correction of the simple potential function. AMOEBA and COMPASS worked on improving the functional form.
It is well understood that charge distributions are affected by both chemical environments and local geometry changes. As we discussed above, the former is explicitly treated in polarizable force field models. AMOEBA+ (2019) improved representation of electrostatic interactions by incorporating charge penetration and intermolecular charge transfer.
Dependence of atomic charges on the local molecular geometry is ignored by almost all classical force fields even though it is well known that it causes issues.
One of the few force fields that consider CF contributions is SDFF:
https://eurekamag.com/research/011/176/011176666.php
Palmo, K.; Mannfors, B.; Mirkin, N. G.; Krimm, S. Inclusion of charge and polarizability fluxes provides needed physical accuracy in molecular mechanics force fields. Chem. Phys. Lett. 2006, 429 (4), 628.
Recently Geometry-Dependent Charge Flux (GDCF) model considering charge flux contributions along bond and angle was introduced in AMOEBA+(CF) force field.
Implementation of Geometry-Dependent Charge Flux into the Polarizable AMOEBA+ Potential
Force Fields Aimed at Improving Quality of Molecular Interactions
CVFF (Consistent Valence Force Field) (Maple & Hagler, 1988)
CFF93 (An ab initio all-atom force field for polycarbonates) (Sun et al., 1994)
CFF, formerly CFF95 (Jonsdottir & Rasmussen, 2000)
MM3 (N.L.Allinger et al., 1989), MM4 (N.L.Allinger et al., 1996)
COMPASS (Condensed-phase Optimized Molecular Potentials for Atomistic Simulation Studies)
COMPASS (H. Sun, 1998)
Developed for simulations of organic molecules, inorganic small molecules, and polymers
COMPASS II (H. Sun et al., 2016)
Extended the coverage to polymer and drug-like molecules found in popular databases. The VDW parameters are obtained by fitting enthalpies of vaporization and densities, to experimental data. The atomic partial charges are derived using QM and empirically adjusted to take hydrogen bonding effects into account. The COMPASS energy function offers six types of cross-terms: bond-bond, bond-angle, angle-angle, bond-torsion, angle-torsion, and angle-torsion-angle.
Biomolecular Force Fields for Large-Scale Simulations.
Long simulations of large systems are required to study biologically relevant processes. It takes a lot of computer power to run such simulations. Due to this, the main objective is to develop a minimal force field that allows simulation sizes and time periods to be extended as much as possible while still keeping chemical structures, interaction energies, and thermodynamic properties within an acceptable range.
The workhorses of modern biomolecular simulations are all-atom, fixed-charge force fields:
- AMBER (Assisted Model Building with Energy Refinement)
- CHARMM (Chemistry at HARvard Macromolecular Mechanics)
- GROMOS (GROningen MOlecular Simulation)
- OPLS (Optimized Potentials for Liquid Simulations)
CHARMM and AMBER forcefields are developed for simulations of proteins and nucleic acids, and they focus on accurate description of structures and non-bonded energies. GROMOS and OPLS are geared toward accurate description of thermodynamic properties such as heats of vaporization, liquid densities, and molecular solvation properties.
AMBER
AMBER forcefields are developed for simulations of proteins and nucleic acids and they are focused on accurate description of structures and non-bonded energies. The VDW parameters are obtained from crystal structures and lattice energies. The atomic partial charges are fitted to QM electrostatic potential without any empirical adjustments.
ff99 (1999)
The main idea is that the use of RESP charges, should lead to the need for fewer torsional potentials than in models with an empirical charge derivation. Potential energy surface scans were performed using four different ab initio methods, HF/6‐31G*, MP2/6‐31G*, MP2/6‐311+G (2d,p), and B3LYP/6‐311+G (2d,p).
Wang J, Cieplak P, Kollman PA. How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules? Journal of Computational Chemistry. 2000;21: 1049–1074. doi:10.1002/1096-987X(200009)21:12<1049::AID-JCC3>3.0.CO;2-F
After publication of ff99 a number of studies devoted primarily to modifying the torsion potentials in order to correct the observed discrepancies have been published:
ff03 (2003)
Completely new charge set developed using B3LYP/cc-pvTZ quantum calculations in a polarizable continuum (PCM) solvent intended to mimic the interior of a protein. Backbone torsion Fourier series were derived specifically for this new charge set at the MP2/cc-pvTZ level of theory, also in the context of PCM solvent. MM gas-phase energies computed with charges derived in PCM solvent have been shown to double-count polarization effects, and ff03 force field has not become as widely used or further refined as ff99
Duan, Yong, Chun Wu, Shibasish Chowdhury, Mathew C. Lee, Guoming Xiong, Wei Zhang, Rong Yang, et al. “A point-charge force field for molecular mechanics simulations of proteins based on condensed-phase quantum mechanical calculations.” Journal of Computational Chemistry 24, no. 16 (2003): 1999–2012. https://doi.org/10.1002/jcc.10349.
ff99sb (2006)
Optimized for the correct description of the helix-coil equilibrium
ff99SB-φ’
Targeted the reproduction of the intrinsic conformational preferences of tripeptides
ff99SBnmr (2010) and ff99SB_φψ
Target data included protein NMR chemical shifts and residual dipolar couplings.
ff99SBildn (2010)
Targeted optimization of four amino acid side chains.
Lindorff‐Larsen, Kresten, Stefano Piana, Kim Palmo, Paul Maragakis, John L. Klepeis, Ron O. Dror, and David E. Shaw. “Improved Side-Chain Torsion Potentials for the Amber Ff99SB Protein Force Field.” Proteins: Structure, Function, and Bioinformatics 78, no. 8 (2010): 1950–58. doi:10.1002/prot.22711.
ff12SB (2012)
This is the preliminary version of ff14SB described in the same paper.
ff12SB-cMAP (2015)
Force field for implicit-solvent simulations.
ff99IDPs (2015)
Force field for intrinsically disordered proteins
ff14ipq (2014)
This force field is derived using another new approach aiming to derive charges implicitly polarized by the fixed charge explicit water (IPolQ method). The charges are derived in self-consistent manner in presence of explicit water molecules represented by TIP4P-Ew water model at MP2/cc-pV(T+d)Z level. The weak point of the ff14ipq force field is overstabilization of salt bridges.
Cerutti, David S., William C. Swope, Julia E. Rice, and David A. Case. Ff14ipq: A Self-Consistent Force Field for Condensed-Phase Simulations of Proteins. Journal of Chemical Theory and Computation 10, no. 10 (October 14, 2014): 4515–34. doi:10.1021/ct500643c.
ff14SB (2015)
Another attempt to corect for deficiencies of ff99SB by using new side chain dihedral parameters based on MP2 level calculations followed by adjustment to the backbone φ energy profile. Used the old ff99SB charge set.
Maier, James A., Carmenza Martinez, Koushik Kasavajhala, Lauren Wickstrom, Kevin E. Hauser, and Carlos Simmerling. “Ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from Ff99SB.” Journal of Chemical Theory and Computation 11, no. 8 (August 11, 2015): 3696–3713. https://doi.org/10.1021/acs.jctc.5b00255.
ff15ipq (2016)
The motivation for the development was to address the salt bridge overstabilization problem of ff14ipq. However, this forcefiled is not a small correction applied to ff14ipq. It is is a complete semi-automatic rederivation of all parameters with the different water model SPC/Eb. The salt bridge overstabilization issue was addressed by increasing radius of polar hydrogens bonded to nitrogen. The modified FF performed as well or better than the other fixed charge force fields. Polarizable CHARMM Drude-2013 and AMOEBA performed better in this respect, as expected.
Problems related to protein stability persist. Even 4 μs simulations “were not sufficiently long to obtain converged estimates of secondary structure of polypeptides”. In simulation tests some proteins deviated significantly near the end of several microsecond simulations, and it is not clear whether this is a transient fluctuation or transition to a different state.
Debiec, Karl T., David S. Cerutti, Lewis R. Baker, Angela M. Gronenborn, David A. Case, and Lillian T. Chong. Further along the Road Less Traveled: AMBER Ff15ipq, an Original Protein Force Field Built on a Self-Consistent Physical Model. Journal of Chemical Theory and Computation 12, no. 8 (August 9, 2016): 3926–47. doi:10.1021/acs.jctc.6b00567
ff15FB (2017)
The goal was to reoptimize intramolecular bond, angle, and dihedral parameters to fit MP2 level QM data without modifying the nonbonded parameters. The parameter optimization was done using ForceBalance package. Same peformance as the earlier versions for equilibrium properties, improved performance for temperature dependence. For best agreement with experiment recommended to use with the TIP3P-FB water. The TIP3P-FB model was parametrized to reproduce the temperature and pressure dependence of a wide range of thermodynamic properties.
ff19SB (2020)
Backbone dihedral parameters optimized using as reference data the entire 2D quantum mechanics (QM) energy surface. Both QM and MM calculations were done in aqueous solution. AMBER ff19SB uses CMAP torsional potentials. The authors concluded that “ff19SB, when combined with a more accurate water model such as OPC, should have better predictive power for modeling sequence-specific behavior, protein mutations, and also rational protein design”.
CHARMM
CHARMM19 (1996)
United-atom model, originally released in 1985.
CHARMM22 (1998)
All-atom model.
CHARMM27 (2000)
Update to the CHARMM22 featuring the optimization of nucleic acid and lipid parameters, as well as the introduction of a number of new ions.
CHARMM22/CMAP (2004)
This forcefield introduces a tabulated correction for the φ-, ψ-angular dependence of the potential energy. As a result of the application of CMAP correction, the dynamical and structural properties of proteins were significantly improved in molecular dynamics simulations.
CHARMM27r (2005)
Improved lipid parameter set.
CHARMM35 (2008)
Carbohydrate parameter set.
CGenFF (2009)
CHARMM36 (2012)
refined backbone CMAP potentials and introduced new side-chain dihedral parameters. The updated CMAP corrected the C22/CMAP FF bias towards alpha-helical conformations.
Review
CHARMM Force Field Files
GROMOS
53A5, 53A6 (Oostenbrink et al., 2004)
54A7, 54B7 (N.Schmid et al., 2011)
54A8 (M.M.Reif et al., 2012)
Parameter sets for GROMOS force fields are specified in accordance with the following format:
- The number of atom types
- A letter code that describes for what conditions the parameters are optimized (e.g. A - solution, B - vacuum).
- The version number
OPLS
Force fields of the OPLS family are designed for simulating liquids that contain organic molecules and proteins. The VDW parameters are optimized using experimental liquid properties, mainly enthalpies of vaporization and densities. The atomic partial charges are derived using QM and experimental condensed-phase properties. An important part of the OPLS philosophy is balancing solvent-solvent and solute-solvent interactions.
OPLS-AA (1996)
This is the first all-atom OPLS force field. Bond stretching and angle bending parameters are taken from the AMBER force field. The torsional parameters were fit to the RHF/6-31G* calculations of about 50 organic molecules and ions. The charges are empirical and have been obtained from fitting to reproduce properties of organic liquids.
Jorgensen W, Maxwell D, Tirado-Rives J. Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. J Am Chem Soc. 1996;118: 11225–11236.
OPLS-AA/L (2001)
Large data set, more than 2000 data points of energies for the 20 amino acids based on geometry optimization at the HF/6-31G** level followed by single-point LMP2/cc-pVTZ(-f) calculations. This level of theory is accurate within 0.25 kcal/mol.
Kaminski GA, Friesner RA, Tirado-Rives J, Jorgensen WL. Evaluation and Reparametrization of the OPLS-AA Force Field for Proteins via Comparison with Accurate Quantum Chemical Calculations on Peptides. J Phys Chem B. 2001;105: 6474–6487. doi:10.1021/jp003919d
OPLS_2005 (2005)
Main goal - to broaden the coverage of OPLS_2001 and refine torsion parameters using a larger dataset. The new Data set included torsion profiles from 637 compounds.
Banks JL, Beard HS, Cao Y, Cho AE, Damm W, Farid R, et al. Integrated Modeling Program, Applied Chemical Theory (IMPACT). J Comput Chem. 2005;26: 1752–1780. doi:10.1002/jcc.20292
OPLS-AAx & OPLS/CM1Ax (2012)
The representation of chlorine, bromine, and iodine in aryl halides has been modified in the OPLS-AA and OPLS/CM1A force fields in order to incorporate halogen bonding. It was accomplished by adding a partial positive charge in the region of the σ-hole.
OPLS2.0 (2012)
OPLS2 was developed to improve the accuracy of drug-like molecules. Substantially expanded data set contained QM data on more than 11,000 molecules. CM1A-BCC charges were used. The BCC terms were parameterized against the OPLS-AA charges for a core set of 112 molecules and the electrostatic potential at the HF/6-31G* level. The BCC terms were empirically adjusted to minimize the errors with regard to the ASFE using a training set of 153 molecules.
Shivakumar D, Harder E, Damm W, Friesner RA, Sherman W. Improving the Prediction of Absolute Solvation Free Energies Using the Next Generation OPLS Force Field. J Chem Theory Comput. 2012;8: 2553–2558. doi:10.1021/ct300203w
OPLS-AA/M (2015)
New torsion parameters using higher level ωB97X-D(23)/6-311++G(d,p) and B2PLYP-D3BJ/aug-cc-pVTZ(26) QM calculations.
Robertson MJ, Tirado-Rives J, Jorgensen WL. Improved Peptide and Protein Torsional Energetics with the OPLS-AA Force Field. J Chem Theory Comput. 2015;11: 3499–3509. doi:10.1021/acs.jctc.5b00356
OPLS3 (2016)
Added off-atom charge sites to represent halogen bonding and aryl nitrogen lone pairs. Complete refit of peptide dihedral parameters using an order of magnitude more data. Claimed 30% improvement. Still the same original VDW parameters.
Harder E, Damm W, Maple J, Wu C, Reboul M, Xiang JY, et al. OPLS3: A Force Field Providing Broad Coverage of Drug-like Small Molecules and Proteins. J Chem Theory Comput. 2016;12: 281–296. doi:10.1021/acs.jctc.5b00864
Polarizable Force fields
AMBER
ff02pol (P. Cieplak et al., 2001)
ff12pol (J. Wang et al., 2011)
AMOEBA (Atomic Multipole Optimized Energetics for Biomolecular Applications)
AMOEBA-2002 (Ren and Ponder, 2002)
AMOEBA-2013 (Shi et al., 2013)
AMOEBA-2013 uses permanent electrostatic multipole (dipole and quadrupole) moments at each atom and explicitly treats polarization effects under various chemical and physical conditions.
AMOEBA+ (C. Liu et al., 2019)
AMOEBA+(CF) (C. Liu et al., 2020)
Q-AMOEBA (N.Mauger et al., 2022)
OPLS
OPLS/PFF https://onlinelibrary.wiley.com/doi/10.1002/jcc.10125
Force field for water with polarization capabilities. Intended for simulations that explicitly take nuclear quantum effects into account.
CHARMM
CHARMM fluctuating charge model (S.Patel et al., 2004)
CHARMM Drude model (P.E.M. Lopez, 2013)
CHARMM Drude-19 F.-Y. Lin, 2020
Drude parameters for DPPC J. Chowdhary, 2013
Drude parameters for DOPC and POPE H. Chu, 2017
Force Fields for Small Molecules
GAFF
CGenFF
- (F.-U.Lin & AD. MacKerell, 2020) Force Fields for Small Molecules
Additional Reading
-
(Dauber-Osguthorpe, 2019) Biomolecular force fields: where have we been, where are we now, where do we need to go and how do we get there? - Review of the origins of FF based calculations, theory and methodology of FF development.
-
(Hagler, 2019) Force field development phase II: Relaxation of physics-based criteria… or inclusion of more rigorous physics into the representation of molecular energetics. - The latest developments in improvement of FF accuracy and robustness are discussed.
-
Tinker-HP is a multi-CPUs and multi-GPUs/multi-precision, MPI massively parallel package dedicated to long molecular dynamics simulations with classical and polarizable force fields, neural networks and advanced QM/MM.
-
K. Vanommeslaeghe, A.D. MacKerell Jr., 2015. Review of additive and polarizable CHARMM force fields for biophysics and computer-aided drug design.
Key Points
There are different types of force fields designed for different types of simulations.
Induction effects are not accounted for by fixed-charge force fields.
Using more accurate and diverse target data allows MD force fields to be improved.