Practical considerations for Molecular Dynamics

Force Fields and Interactions

Overview

Teaching: 30 min
Exercises: 0 min
Questions
  • What is Molecular Dynamics and how can I benefit from using it?

  • What is a force field?

  • What mathematical energy terms are used in biomolecular force fields?

Objectives
  • Understand strengths and weaknesses of MD simulations

  • Understand how the interactions between particles in a molecular dynamics simulation are modeled

Introduction

MD-timeline: system-size vs. time Figure from: AI-Driven Multiscale Simulations Illuminate Mechanisms of SARS-CoV-2 Spike Dynamics


Recent example - simulation of the whole SARS-CoV-2 virion

Image: Simulation of SARS-CoV-2 with NAMD Figure from AI-Driven Multiscale Simulations Illuminate Mechanisms of SARS-CoV-2 Spike Dynamics


Goals of the Workshop

The focus will be on reproducibility and automation by introducing scripting and batch processing.


The theory behind the method of MD.

Force Fields

Interactions are approximated with a simple empirical potential energy function.

Flow diagram of MD process Flow diagram of MD simulation

A force field is a set of empirical energy functions and parameters used to calculate the potential energy U as a function of the molecular coordinates.

$U(\vec{r})=\sum{U_{bonded}}(\vec{r})+\sum{U_{non-bonded}}(\vec{r})$

Classification of force fields.

Class 1 force fields.

Examples: AMBER, CHARMM, GROMOS, OPLS

Class 2 force fields.

Examples: MMFF94, UFF

Class 3 force fields.

Examples of class 3 force fields are: AMOEBA, DRUDE

Energy Terms of Biomolecular Force Fields

Non-Bonded Terms

graph: Interactions

The Lennard-Jones potential

$V_{LJ}(r)=\frac{C12}{r^{12}}-\frac{C6}{r^{6}}$

graph: Lennard-Jones potential

$V_{LJ}(r)=4\epsilon\left[\left(\frac{\sigma}{r}\right)^{12}-\left(\frac{\sigma}{r}\right)^{6}\right]$

$C12=4\epsilon\sigma^{12},C6=4\epsilon\sigma^{6}$

The Lennard-Jones Combining Rules

Combining rules

Geometric mean:

\(C12_{ij}=\sqrt{C12_{ii}\times{C12_{jj}}}\qquad C6_{ij}=\sqrt{C6_{ii}\times{C6_{jj}}}\qquad\) (GROMOS)

\(\sigma_{ij}=\sqrt{\sigma_{ii}\times\sigma_{jj}}\qquad\qquad\qquad \epsilon_{ij}=\sqrt{\epsilon_{ii}\times\epsilon_{jj}}\qquad\qquad\) (OPLS)

Lorentz–Berthelot:

\(\sigma_{ij}=\frac{\sigma_{ii}+\sigma_{jj}}{2},\qquad \epsilon_{ij}=\sqrt{\epsilon_{ii}\times\epsilon_{jj}}\qquad\) (CHARM, AMBER).

Less common combining rules.

Waldman–Hagler:

\(\sigma_{ij}=\left(\frac{\sigma_{ii}^{6}+\sigma_{jj}^{6}}{2}\right)^{\frac{1}{6}}\) , \(\epsilon_{ij}=\sqrt{\epsilon_{ij}\epsilon_{jj}}\times\frac{2\sigma_{ii}^3\sigma_{jj}^3}{\sigma_{ii}^6+\sigma_{jj}^6}\)

This combining rule was developed specifically for simulation of noble gases.

Hybrid (the Lorentz–Berthelot for H and the Waldman–Hagler for other elements). Implemented in the AMBER-ii force field for perfluoroalkanes, noble gases, and their mixtures with alkanes.

The Buckingham potential

$V_{B}(r)=Aexp(-Br) -\frac{C}{r^{6}}$

graph: Buckingham potential

There is only one combining rule for Buckingham potential in GROMACS:
$A_{ij}=\sqrt{(A_{ii}A_{jj})}$
$B_{ij}=2/(\frac{1}{B_{ii}}+\frac{1}{B_{jj}})$
$C_{ij}=\sqrt{(C_{ii}C_{jj})}$

Combining rule (GROMACS):
\(A_{ij}=\sqrt{(A_{ii}A_{jj})} \qquad B_{ij}=2/(\frac{1}{B_{ii}}+\frac{1}{B_{jj}}) \qquad C_{ij}=\sqrt{(C_{ii}C_{jj})}\)

Specifying Combining Rules

GROMACS

Combining rule is specified in the [defaults] section of the forcefield.itp file (in the column ‘comb-rule’).

[ defaults ]
; nbfunc        comb-rule       gen-pairs       fudgeLJ fudgeQQ
1               2               yes             0.5     0.8333

Geometric mean is selected by using rules 1 and 3; Lorentz–Berthelot rule is selected using rule 2.

GROMOS force field requires rule 1; OPLS requires rule 3; CHARM and AMBER require rule 2

The type of potential function is specified in the ‘nbfunc’ column: 1 selects Lennard-Jones potential, 2 selects Buckingham potential.

NAMD

By default, Lorentz–Berthelot rules are used. Geometric mean can be turned on in the run parameter file:

vdwGeometricSigma yes

The electrostatic potential

graph: electrostatic potential

Short-range and Long-range Interactions

Bonded Terms

The bond potential

graph: harmonic bond potential

The angle potential

graph: harmonic angle potential

The torsion (dihedral) angle potential

$V_{Dihed}=k_\phi(1+cos(n\phi-\delta)) + …$

graph: torsion/dihedral potential

graph: torsion/dihedral potential

The improper torsion potential

graph: improper-dihedral potential

Where the dihedral angle \(\phi\) is the angle between planes ijk and ijl.


Coupling Terms

The Urey-Bradley potential

graph: Urey-Bradley potential

CHARMM CMAP correction potential

graph: Phi Psi

graph: Phi Psi

Energy scale of potential terms

\(k_BT\) at 298 K ~ 0.593 \(\frac{kcal}{mol}\)
Bond vibrations ~ 100 ‐ 500 \(\frac{kcal}{mol \cdot \unicode{x212B}^2}\)
Bond angle bending ~ 10 - 50 \(\frac{kcal}{mol \cdot deg^2}\)
Dihedral rotations ~ 0 - 2.5 \(\frac{kcal}{mol \cdot deg^2}\)
van der Waals ~ 0.5 \(\frac{kcal}{mol}\)
Hydrogen bonds ~ 0.5 - 1.0 \(\frac{kcal}{mol}\)
Salt bridges ~ 1.2 - 2.5 \(\frac{kcal}{mol}\)

Exclusions from Non-Bonded Interactions

exclusions

What Information Can MD Simulations Provide?

With the help of MD it is possible to model phenomena that cannot be studied experimentally. For example

For more examples of the types of information MD simulations can provide read the review article: Molecular Dynamics Simulation for All.


Challenge: Counting Non-Bonded Interactions

How many non-bonded interactions are in the system with ten Argon atoms? 10, 45, 90, or 200?

Solution

Argon atoms are neutral, so there is no Coulomb interaction. Atoms don’t interact with themselves and the interaction ij is the same as the interaction ji. Thus the total number of pairwise non-bonded interactions is (10x10 - 10)/2 = 45.


Challenge: Exclusions

How many 1-4 interactions between carbon atoms are in the system composed of 100 pentane (H3C-CH2-CH2-CH2-CH3) molecules?

Solution

200


Challenge: Interaction potentials

Which potential term describes the interaction between two molecules? CMAP, Urey-Bradley, Lennard-Jones or angle?

Solution

The Lennard-Jones potential


Specifying Exclusions

GROMACS

The exclusions are generated by grompp as specified in the [moleculetype] section of the molecular topology .top file:

[ moleculetype ]
; name  nrexcl
Urea         3
...
[ exclusions ]
;  ai    aj
   1     2
   1     3
   1     4

In the example above non-bonded interactions between atoms that are no farther than 3 bonds are excluded (nrexcl=3). Extra exclusions may be added explicitly in the [exclusions] section.

The scaling factors for 1-4 pairs, fudgeLJ and fudgeQQ, are specified in the [defaults] section of the forcefield.itp file. While fudgeLJ is used only when gen-pairs is set to ‘yes’, fudgeQQ is always used.

[ defaults ]
; nbfunc        comb-rule       gen-pairs       fudgeLJ fudgeQQ
1               2               yes             0.5     0.8333

NAMD

Which pairs of bonded atoms should be excluded is specified by the exclude parameter.
Acceptable values: none, 1-2, 1-3, 1-4, or scaled1-4

exclude scaled1-4
1-4scaling 0.83
scnb 2.0

If scaled1-4 is set, the electrostatic interactions for 1-4 pairs are multiplied by a constant factor specified by the 1-4scaling parameter. The LJ interactions for 1-4 pairs are divided by scnb.

Key Points

  • Molecular dynamics simulates atomic positions in time using forces acting between atoms

  • The forces in molecular dynamics are approximated by simple empirical functions


Fast Methods to Evaluate Forces

Overview

Teaching: 30 min
Exercises: 0 min
Questions
  • Why the computation of non-bonded interactions is the speed limiting factor?

  • How are non-bonded interactions computed efficiently?

  • What is a cutoff distance?

  • How to choose the appropriate cutoff distance?

Objectives
  • Understand how non-bonded interactions are truncated.

  • Understand how a subset of particles for calculation of short-range non-bonded is selected.

  • Learn how to choose the appropriate cutoff distance and truncation method.

  • Learn how to select cutoff distance and truncation method in GROMACS and NAMD.

Challenges in calculation of non bonded interactions.

Neighbour Searching Methods

Cell Lists

Figure: Grid-cell lists

Verlet Lists

Figure: Verlet lists

Problems with Truncation of Lennard-Jones Interactions and How to Avoid Them?

Cutoff Methods Figure 1. The Distance Dependence of Potential and Force for Different Truncation Methods

Shifted potential  

$\circ$ Shift the whole potential uniformly by adding a constant at values below cutoff.
$\circ$ Avoids infinite forces.
$\circ$ Does not change forces at the distances below cutoff.
$\circ$ Introduces a discontinuity in the force at the cutoff distance.
$\circ$ Modifies total potential energy.
Shifted Force  

$\circ$ Shift the whole force so that it vanishes at the cutoff distance.
$\circ$ Modifies equations of motion at all distances.
$\circ$ Better results at shorter cutoff values compared to the potential shift.
Switching Function  

$\circ$ Modify the shape of the potential function near cutoff.
$\circ$ Forces are modified only near the cutoff boundary and they approach zero smoothly.

How to Choose the Appropriate Cutoff Distance?

For example for the O, N, C, S, and P atoms in the AMBER99 force field the values of \(\sigma\) are in the range 1.7-2.1, while for the Cs ions \(\sigma=3.4\). Thus the minimum acceptable cutoff, in this case, is 8.5.

Table 1. Cutoffs Used in Development of the Common Force Fields

AMBER CHARMM GROMOS OPLS
8 , 10 (ff19SB) 12 14 11-15 (depending on a molecule size)

Properties that are very sensitive to the choice of cutoff

For such quantities even a cutoff at 2.5 \(\sigma\) gives inaccurate results, and in some cases the cutoff larger than 6 \(\sigma\) was required for reliable simulations (Grosfils, 2009).

Effect of cutoff on energy conservation

Truncation of the Electrostatic Interactions


Challenge: Truncation of VDW Interactions

Which truncation method modifies VDW interactions only near the cut-off distance?

Solution

Switching functions


Selecting Neighbour Searching Methods

GROMACS

Neighbour searching is specified in the run parameter file mdp.

cutoff-scheme  =  group
; Generate a pair list for groups of atoms. Since version 5.1 group list has been deprecated and only Verlet scheme is available.

cutoff-scheme  =  Verlet
; Generate a pair list with buffering. The buffer size is automatically set based on verlet-buffer-tolerance, unless this is set to -1, in which case rlist will is used.

; Neighbour search method.
ns-type = grid
; Make a grid in the box and only check atoms in neighboring grid cells.

ns-type = simple
; Loop over every atom in the box.

nstlist = 5
; Frequency to update the neighbour list. If set to 0 the neighbour list is constructed only once and never updated. The default value is 10.

NAMD

When run in parallel NAMD uses a combination of spatial decomposition into grid cells (patches) and Verlet lists with extended cutoff distance.

stepspercycle 10
# Number of timesteps in each cycle. Each cycle represents the number of timesteps between atom reassignments. Default value is 20.

pairlistsPerCycle 2
# How many times per cycle to regenerate pairlists. Default value is 2.

Selecting LJ Potential Truncation Method

GROMACS

Truncation of LJ potential is specified in the run parameter file mdp.

vdw-modifier = potential-shift
;  Shifts the VDW potential by a constant such that it is zero at the rvdw.

vdw-modifier = force-switch
;  Smoothly switches the forces to zero between rvdw-switch and rvdw.

vdw-modifier = potential-switch
;  Smoothly switches the potential to zero between rvdw-switch and rvdw.

vdw-modifier = none

rvdw-switch = 1.0
;  Where to start switching.

rvdw = 1.2
;  Cut-off distance

NAMD

Truncation of LJ potential is specified in the run parameter file mdin.

cutoff 12.0
# Cut-off distance common to both electrostatic and van der Waals calculations

switching on
# Turn switching on/off. The default value is off.

switchdist 10.0
# Where to start switching

vdwForceSwitching on
# Use force switching for VDW. The default value is off.

AMBER force fields

AMBER force fields are developed with hard truncation. Do not use switching or shifting with these force fields.

Selecting Cutoff Distance

GROMACS

Cutoff and neighbour searching is specified in the run parameter file mdp.

rlist = 1.0
; Cutoff distance for the short-range neighbour list. Active when verlet-buffer-tolerance = -1, otherwise ignored.

verlet-buffer-tolerance = 0.002
; The maximum allowed error for pair interactions per particle caused by the Verlet buffer. To achieve the predefined tolerance the cutoff distance rlist is adjusted indirectly. To override this feature set the value to -1. The default value is 0.005 kJ/(mol ps).

NAMD

When run in parallel NAMD uses a combination of spatial decomposition into grid cells (patches) and Verlet lists with extended cutoff distance.

pairlistdist 14.0
# Distance between pairs for inclusion in pair lists. Should be bigger or equal than the cutoff. The default value is cutoff.

cutoff 12.0
# Local interaction distance. Same for both electrostatic and VDW interactions.

Key Points

  • The calculation of non-bonded forces is the most computationally demanding part of a molecular dynamics simulation.

  • Non-bonded interactions are truncated to speed up simulations.

  • The cutoff distance should be appropriate for the force field and the size of the periodic box.


Advancing Simulation in Time

Overview

Teaching: 20 min
Exercises: 0 min
Questions
  • How is simulation time advanced in a molecular dynamics simulation?

  • What factors are limiting a simulation time step?

  • How to accelerate a simulation?

Objectives
  • Understand how simulation is advanced in time.

  • Learn how to choose time parameters and constraints for an efficient and accurate simulation.

  • Learn how to specify time parameters and constraints in GROMACS and NAMD.

Integration Algorithms

The Euler Algorithm

  1. Use $\boldsymbol{r}, \boldsymbol{v},\boldsymbol{a}$ at time $t$ to compute $\boldsymbol{r}(t+\delta{t})$ and $\boldsymbol{v}(t+\delta{t})$:

$\qquad\boldsymbol{r}(t+\delta{t})=\boldsymbol{r}(t)+\boldsymbol{v}(t)\delta{t}+\frac{1}{2}\boldsymbol{a}(t)\delta{t}^2$

$\qquad\boldsymbol{v}(t+\delta{t})=\boldsymbol{v}(t)+\frac{1}{2}\boldsymbol{a}(t)\delta{t}$

Drawbacks:

  1. Not energy conserving
  2. Not reversible in time

Applications

The original Verlet Algorithm

Verlet improved the Euler integration by using positions at two successive time steps. Using positions from two time steps ensured that acceleration changes were taken into account. Essentially, this algorithm is as follows: calculate the next positions using the current positions, forces, and previous positions:
$\qquad\boldsymbol{r}(t+\delta{t})=2\boldsymbol{r}(t)-\boldsymbol{r}(t-\delta{t})+\boldsymbol{a}(t)\delta{t}^2$

  • The Verlet algorithm (Verlet, 1967) requires positions at two time steps. It is inconvenient when starting a simulation when only current positions are available.

While velocities are not needed to compute trajectories, they are useful for calculating observables e.g. the kinetic energy. The velocities can only be computed once the next positions are calculated:

$\qquad\boldsymbol{v}(t+\delta{t})=\frac{r{(t+\delta{t})-r(t-\delta{t})}}{2\delta{t}}$

The Verlet algorithm is time-reversible and energy conserving.

The Velocity Verlet Algorithm

Euler integrator can be improved by introducing evaluation of the acceleration at the next time step. You may recall that acceleration is a function of atomic coordinates and is determined completely by interaction potential.

  1. Use $\boldsymbol{r}, \boldsymbol{v},\boldsymbol{a}$ at time $t$ to compute $\boldsymbol{r}(t+\delta{t})$: $\qquad\boldsymbol{r}(t+\delta{t})=\boldsymbol{r}(t)+\boldsymbol{v}(t)\delta{t}+\frac{1}{2}\boldsymbol{a}(t)\delta{t}^2$
  2. Derive $ \boldsymbol{a}(t+\delta{t})$ from the interaction potential using new positions $\boldsymbol{r}(t+\delta{t})$
  3. Use both $\boldsymbol{a}(t)$ and $\boldsymbol{a}(t+\delta{t})$ to compute $\boldsymbol{v}(t+\delta{t})$: $\quad\boldsymbol{v}(t+\delta{t})=\boldsymbol{v}(t)+\frac{1}{2}(\boldsymbol{a}(t)+\boldsymbol{a}(t+\delta{t}))\delta{t} $

The Velocity Verlet algorithm is mathematically equivalent to the original Verlet algorithm. It explicitly incorporates velocity, solving the problem of the first time step in the basic Verlet algorithm.

Leap Frog Variant of Velocity Verlet

Velocity, position, and forces are calculated using the following algorithm:

  1. Derive $ \boldsymbol{a}(t)$ from the interaction potential using positions $\boldsymbol{r}(t)$
  2. Use $\boldsymbol{v}(t-\frac{\delta{t}}{2})$ and $\boldsymbol{a}(t)$ to compute $\boldsymbol{v}(t+\frac{\delta{t}}{2})$: $\qquad\boldsymbol{v}(t+\frac{\delta{t}}{2})=\boldsymbol{v}(t-\frac{\delta{t}}{2}) + \boldsymbol{a}(t)\delta{t}$
  3. Use current $\boldsymbol{r}(t)$ and $\boldsymbol{v}(t+\frac{\delta{t}}{2})$ to compute $\boldsymbol{r}(t+\delta{t})$ : $\qquad\boldsymbol{r}(t+\delta{t})=\boldsymbol{r}(t)+\boldsymbol{v}(t+\frac{\delta{t}}{2})\delta{t}$

Discontinuities in simulations occur when the integrator is changed

Since velocities in Leap-frog restart files are shifted by half a time step from coordinates, changing the integrator to Verlet will introduce some discontinuity.

Integrator pairs of coordinates ($r$) and velocities ($v$)
Velocity-Verlet $[r_{t=0}, v_{t=0}], [r_{t=1}, v_{t=1}], [r_{t=2}, v_{t=2}], \ldots $
Leap-Frog $[r_{t=0}, v_{t=0.5}], [r_{t=1}, v_{t=1.5}], [r_{t=2}, v_{t=2.5}], \ldots $

A simulation system will experience an instantaneous change in kinetic energy. That makes it impossible to transition seamlessly between integrators.

Selecting the Integrator

GROMACS

Several integration algorithms available in GROMACS are specified in the run parameter mdp file.

integrator = md
; A leap frog algorithm

integrator = md-vv
;  A velocity Verlet algorithm

integrator = md-vv-avek
; A velocity Verlet algorithm same as md-vv except the kinetic energy is calculated as the average of the two half step kinetic energies. More accurate than the md-vv.

integrator = sd
;  An accurate leap frog stochastic dynamics integrator.

integrator = bd
; A Euler integrator for Brownian or position Langevin dynamics.

NAMD

The only available integration method is Verlet.

How to Choose Simulation Time Step?

Larger time step allows to run simulation faster, but accuracy decreases.

Other ways to increase simulation speed

Specifying Time Parameters

GROMACS

Time parameters are specified in the mdp run parameter file.

dt = 0.001
; Time step, ps

nsteps = 10000
; Number of steps to simulate

tinit = 0
; Time of the first step

NAMD

Time parameters are specified in the mdin run parameter file.

TimeStep = 1
# Time step, fs

NumSteps = 10000
# Number of steps to simulate

FirstTimeStep = 0
# Time of the first step

#  Multiple time stepping parameters
nonbondedFreq 2
#  Number of timesteps between short-range non-bonded evaluation.

fullElectFrequency 4
# Number of timesteps between full electrostatic evaluations

Constraint Algorithms

In constrained simulation first the unconstrained step is done, then corrections are applied to satisfy constraints.

SETTLE

SHAKE

Extensions of the original SHAKE algorithm: RATTLE, QSHAKE, WIGGLE, MSHAKE, P-SHAKE.

LINCS


Choosing simulation time step

Which of the following methods allows for a time step of 4 fs?

  1. Constraining bonds involving hydrogen atoms
  2. Hydrogen mass repartitioning
  3. Constraining bonds and angles involving hydrogen atoms
  4. Constraining angles between heavy atoms

Solution

Hydrogen mass repartitioning


Specifying Constraints

GROMACS

SHAKE, LINCS and SETTLE constraint algorithms are implemented. They are selected via keywords in mdp input files

constraints = h-bonds
; Constrain bonds with hydrogen atoms

constraints = all-bonds
; Constrain all bonds

constraints = h-angles
; Constrain all bonds and additionally the angles that involve hydrogen atoms

constraints = all-angles
; Constrain all bonds and angles

constraint-algorithm = LINCS
; Use LINCS

constraint-algorithm = SHAKE
; Use SHAKE

shake-tol = 0.0001
;  Relative tolerance for SHAKE, default value is 0.0001.

SETTLE can be selected in the topology file:

[ settles ]
; OW    funct   doh     dhh
1       1       0.1     0.16333

NAMD

SHAKE and SETTLE constraint algorithms are implemented. They are selected via keywords in simulation input file.

rigidBonds water
# Use SHAKE to constrain bonds with hydrogens in water molecules.

rigidBonds all
# Use SHAKE to constrain bonds with hydrogens in all molecules.

rigidBonds none
# Do not constrain bonds. This is the default.

rigidTolerance 1.0e-8
# Stop iterations when all constrained bonds differ from the nominal bond length by less than this amount. Default value is 1.0e-8.

rigidIterations 100
# The maximum number of iterations. If the bond lengths do not converge, a warning message is emitted. Default value is 100.

rigidDieOnError on
# Exit and report an error if rigidTolerance is not achieved after rigidIterations. The default value is on.

useSettle on
# If rigidBonds are enabled then use the SETTLE algorithm to constrain waters. The default value is on.

Key Points

  • A good integration algorithm for MD should be time-reversible and energy conserving.

  • The most widely used integration method is the velocity Verlet.

  • Simulation time step must be short enough to describe the fastest motion.

  • Time step can be increased if bonds involving hydrogens are constrained.

  • Additional time step increase can be achieved by constraining all bonds and angles involving hydrogens.


Periodic Boundary Conditions

Overview

Teaching: 20 min
Exercises: 0 min
Questions
  • How to simulate a bulk (gas, liquid or solid) system by using only a small part?

Objectives
  • Understand why and when periodic boundary conditions are used.

  • Understand how shape and size of a periodic box can affect simulation.

  • Learn how to set periodic box parameters in GROMACS and NAMD.

What is PBC and why is it important?

Figure: Periodic Boundary Conditions

Choosing periodic box size and shape.

Box shape

Cubic periodic box

Figure: Cubic box

Octahedral and dodecahedral periodic boxes

  Space filling with truncated octahedrons
Figure: truncated Octahedron

Triclinic periodic boxes

Figure: Triclinic Cell

Box size

10 nm margin

Figure: Periodic Boundary Conditions

Pitfalls

Figure: Periodic Boundary Conditions


Periodic box

Which of the statements is incorrect?

  1. The smallest dimension of a periodic box must be at least as large as the cut-off radius
  2. The minimum box size should extend at least 10 nm from the solute
  3. The smallest dimension of a periodic box must be at least as large as the double cut-off radius
  4. There is an equivalent triclinic unit cell for any repeating shape that occupies all of space

Solution

The smallest dimension of a periodic box must be at least as large as the cut-off radius


Specifying Periodic Box

GROMACS

The box specification is integrated into structure files. The box parameters can be set using the editconf program or manually. The editconf program accepts the following options:

-bt Box type triclinic, cubic, dodecahedron, octahedron
-box Box vectors lengths, a, b, c nm
-angles Box vectors angles, bc, ac, ab degrees
-d Distance between the solute and the box nm

Example:

module load StdEnv/2020 gcc gromacs
wget http://files.rcsb.org/view/1lyz.pdb
gmx pdb2gmx -f 1lyz.pdb -ff amber99sb-ildn -water spce -ignh
gmx editconf -f conf.gro -o conf_boxed.gro -d 1.0 -bt cubic

In the example above the editconf program will append box vectors to the structure file ‘conf.gro’ and save it in the file ‘conf_boxed.gro’. The 9 components of the three box vectors are saved in the last line of the structure file in the order: xx yy zz xy xz yx yz zx zy. Three of the values (xy, xz, and yz) are always zeros because they are duplicates of (yx, zx, and zy). The values of the box vectors components are related to the unit cell vectors \(a,b,c,\alpha,\beta,\gamma\) from the CRYST1 record of a PDB file with the equations:

\[xx=a, yy=b\cdot\sin(\gamma), zz=\frac{v}{(a*b*\sin(\gamma))}\] \[xy=0, xz=0, yx=b\cdot\cos(\gamma)\] \[yz=0, zx=c\cdot\cos(\beta), zy=\frac{c}{\sin(\gamma)}\cdot(cos(\alpha)-cos(\beta)\cdot\cos(\gamma))\] \[v=\sqrt{1-\cos^2(\alpha)-cos^2(\beta)-\cos^2(\gamma) +2.0\cdot\cos(\alpha)\cdot\cos(\beta)\cdot\cos(\gamma)}\cdot{a}\cdot{b}\cdot{c}\]

NAMD

Periodic box is specified in the run parameter file by three unit cell vectors, the units are .

# cubic box
cellBasisVector1 100 0 0
cellBasisVector2 0 100 0
cellBasisVector3 0 0 100

Alternatively periodic box parameters can be read from the .xsc (eXtended System Configuration) file by using the extendedSystem keyword. If this keyword is used cellBasisVectors are ignored. NAMD always generates .xsc files at runtime.

extendedSystem restart.xsc

Comparing periodic boxes

Using the structure file ‘conf.gro’ from the example above generate triclinic, cubic, dodecahedral and truncated octahedral boxes with the 15 distance between the solute and the box edge.

Which of the boxes will be the fastest to simulate?

References:

  1. Molecular dynamics simulations with constrained roto-translational motions: Theoretical basis and statistical mechanical consistency
  2. The effect of box shape on the dynamic properties of proteins simulated under periodic boundary conditions
  3. Periodic box types in Gromacs manual

Key Points

  • Periodic boundary conditions are used to approximate an infinitely large system.

  • Periodic box should not restrict molecular motions in any way.

  • The macromolecule shape, rotation and conformational changes should be taken into account in choosing the periodic box parameters.


Degrees of Freedom

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • How is kinetic energy contained and distributed in a dynamic molecular system

  • How to increase efficiency of thermodynamic sampling

  • How reduction of degrees of freedom affects dynamics

Objectives
  • Explain how kinetic energy is distributed in a simulation system

  • Understand how to correctly increase efficiency of thermodynamic sampling

Molecular degrees of freedom

Equipartition theorem

Degrees of freedom and thermodynamics properties

Translational degrees of freedom

Rotational degrees of freedom

Vibrational degrees of freedom

Angle bend Symmetric stretch Asymmetric stretch
bend symmetric asymmetric

Increasing efficiency of thermodynamic sampling.

By reducing the number of degrees of freedom we can increase thermodynamic sampling efficiency.

Reduction of the number of degrees of freedom may lead to artifacts.

Bond and angle constraints can:

Key Points

  • Constraints decrease the number of degrees of freedom

  • Imposing constraints can affect simulation outcome


Electrostatic Interactions

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • How electrostatic interactions are calculated in periodic systems?

Objectives
  • Learn what parameters control the accuracy of electrostatic calculations

Coulomb interactions

graph: electrostatic potential

Particle Mesh Ewald (PME)

Graph: PME Decomposition

Fast decaying short-ranged potential (Particle part).

Slow decaying long-ranged potential (Mesh part).

PME algorithm

Image: PME Grid

  1. Assign charges to grid cells. Charges in grid cells are obtained by interpolation.
  2. Compute Fourier transform.
  3. Compute potential. Coulomb interaction decays rapidly in Fourier space, and summation converges fast.
  4. Compute inverse Fourier transform.
  5. Interpolate gridded potentials back to atomic centers.

Simulation parameters controlling speed and accuracy of PME calculations.


Challenge: Electrostatic interactions

Which of the following statements is incorrect?
For more accurate electrostatic calculations you need to:

  1. Decrease grid spacing
  2. Increase the grid dimensions
  3. Increase direct space tolerance
  4. Increase the interpolation order

Solution

Increase direct space tolerance


PME variables

Variable \ MD package GROMACS NAMD AMBER
Fourier grid spacing fourierspacing (1.2) PMEGridSpacing (1.5)  
Grid Dimension [X,Y,Z] fourier-[nx,ny,nz] PMEGridSize[X,Y,Z] nfft[1,2,3]
Direct space tolerance ewald-rtol (\(10^{-5}\)) PMETolerance (\(10^{-5}\)) dsum_tol (\(10^{-6}\))
Interpolation order pme-order (4) PMEInterpOrder (4) order (4)

Key Points

  • Calculation of electrostatic potentials is the most time consuming part of any MD simulation

  • Long-range part of electrostatic interactions is calculated by approximating Coulomb potentials on a grid

  • Denser grid increases accuracy, but significantly slows down simulation


Controlling Temperature

Overview

Teaching: 20 min
Exercises: 0 min
Questions
  • What is temperature on the molecular level?

  • Why do we want to control the temperature of our MD-simulations?

  • What temperature control algorithms are commonly used?

  • What are the strengths and weaknesses of these common thermostats?

Objectives
  • Remind us how Maxwell-Boltzmann distributions relate to temperature.

  • Quickly review thermodynamic ensembles.

  • Learn about different thermostats and get an idea how they work.

  • Learn where thermostats that don’t produce correct thermodynamic ensembles can still be very useful.

Temperature at the molecular level.

\[f_v(v)=\left(\frac{m}{2\pi k_B T}\right)^{3/2}\cdot4\pi v^2\cdot\exp({-mv^2/2k_B T})\]

Maxwell-Boltzmann distributions

Krypton at different temperatures   Noble gases with different mass at 298K

Plot of velocity distributions

Velocity distributions obtained from MD simulations of water at different temperature.

Plot of Maxwell-Boltzmann distributions

Thermodynamic ensembles

MD simulations are usually simulate one of the following thermodynamic ensembles:

  1. The microcanonical or constant NVE ensemble.
  2. The canonical or constant NVT ensemble.
  3. The isothermal-isobaric or constant NPT ensemble.

Temperature Control Algorithms

  1. Strong coupling methods
  2. Weak coupling methods
  3. Stochastic methods
  4. Extended system dynamics

1. Strong coupling methods

Velocity rescaling

Velocity reassignment

Downsides:

2. Weak coupling methods

Berendsen thermostat

Downsides:

Heat flows between the simulation system and the heat bath with the rate defined by a time constant \(\tau_T\)

3. Stochastic methods

Randomly assign a subset of atoms new velocities based on Maxwell-Boltzmann distributions for the target temperature. Randomization interferes with correlated motion and thus slows down the system’s kinetics.

Andersen thermostat

The Andersen thermostat:

The Lowe-Andersen thermostat

Bussi stochastic velocity rescaling thermostat

Langevin thermostat

4. Extended system thermostats

Nosé-Hoover thermostat

Drawbacks:

The time constant parameter in this thermostat controls the period of temperature fluctuations at equilibrium.

Nosé-Hoover-chains

Global and local thermostats


Challenge: Thermodynamic ensembles

What thermodynamic ensemble describes an isolated system?

  1. Canonical
  2. Grand canonical
  3. Isothermal-isobaric
  4. Microcanonical

Solution

Microcanonical


Challenge: Thermostats

Which of the following statements is incorrect?

  1. Stochastic temperature control methods impair conformational transitions
  2. The Berendsen thermostat is very useful for heating simulation systems
  3. Extended system thermostats control temperature without random velocity rescaling
  4. Local thermostats work well for small groups of atoms

Solution

Local thermostats work well for small groups of atoms

Specifying local thermostats

With NAMD it is possible to set coupling coefficients for each atom in occupancy or beta column of a pdb file:

langevinFile
langevinCol

tCoupleFile
tCoupleFCol

In GROMACS temperature of a selected groups of atoms can be controlled independently using tc-grps. Temperature coupling groups are coupled separately to temperature bath.

Selecting thermostats in molecular dynamics packages

Thermostat/MD package GROMACS NAMD AMBER
velocity rescaling   reascaleFreq (steps)  
velocity reassignment   reassignFreq (steps)  
Andersen tcoupl = andersen    
massive-Andersen tcoupl = andersen-massive   ntt = 2
Lowe-Andersen   loweAndersen on  
Berendsen tcoupl = berendsen tCouple on ntt = 1
Langevin   langevin on ntt = 3
Bussi tcoupl = V-rescale stochRescale on  
Nose-Hoover tcoupl = nose-hoover    
Nose-Hoover-chains nh-chain-length (default 10)    

Key Points

  • On the molecular level, temperature manifests itself as a number of particles having a certain average kinetic energy.

  • Some temperature control algorithms (e.g. the Berendsen thermostat) fail to produce kinetic energy distributions that represent a correct thermodynamic ensemble.

  • Other thermostats, like Nosé-Hoover, produce correct thermodynamic ensembles but can take long to converge.

  • Even though the the Berendsen thermostat fails to produce correct thermodynamic ensembles, it can be useful for system relaxation as it is robust and converges fast.


Controlling Pressure

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • Why do we want to control the pressure of our MD-simulations?

  • What pressure control algorithms are commonly used?

  • What are the strengths and weaknesses of these common barostats?

Objectives
  • Identify the appropriate method for controlling pressure

Introduction

The role of pressure control algorithms is to keep pressure in the simulation system constant or to apply an external stress to the simulated system.

$ \qquad {P}=\frac{NK_{B}T}{V}+\frac{1}{3V}\langle\sum{r_{ij}F_{ij}}\rangle$

Pressure Control Algorithms

  1. Weak coupling methods
  2. Extended system methods
  3. Stochastic methods
  4. Monte-Carlo methods

1. Weak coupling methods

Berendsen pressure bath coupling.

Downsides:

The time constant for pressure bath coupling is the main parameter of the Berendsen thermostat. The pressure of the system is corrected such that the deviation exponentially decays with a lifetime defined by this constant.

Reference: Molecular dynamics with coupling to an external bath

2. Extended system methods

Parrinello-Rahman barostat

Downsides:

Nosé-Hoover barostat

References: [Hoover, 1986], [Martyna, 1994].

MTTK (Martyna-Tuckerman-Tobias-Klein) barostat.

3. Stochastic methods

Langevin piston pressure control.

Reference: Constant pressure molecular dynamics simulation: The Langevin piston method

MTTK and Langevin barostats produce identical ensembles Langevin barostat oscillates less then MTTK and converges faster due to stochastic collisions and damping.

Comparison of Barostats

Reprinted with permission from Rogge et al. 2015, A Comparison of Barostats for the Mechanical Characterization of Metal−Organic Frameworks, J Chem Theory Comput. 2015;11: 5583-97. doi:10.1021/acs.jctc.5b00748. Copyright 2015 American Chemical Society.

Stochastic Cell Rescaling

Reference: [Bernetti and Bussi (2020)]

4. Monte-Carlo pressure control.

References:

  1. Molecular dynamics simulations of water and biomolecules with a Monte Carlo constant pressure algorithm
  2. Constant pressure hybrid Molecular Dynamics–Monte Carlo simulations

Pitfalls


Selecting barostats in molecular dynamics packages

Thermostat\MD package GROMACS NAMD AMBER
Berendsen pcoupl = Berendsen BerendsenPressure on barostat = 1
Stoch. cell rescaling pcoupl = C-rescale    
Langevin   LangevinPiston on  
Monte-Carlo     barostat = 2
Parrinello-Rahman pcoupl = Parrinello-Rahman    
MTTK pcoupl = MTTK    

Key Points

  • Each barostat or thermostat technique has its own limitations and it is your responsibility to choose the most appropriate method or their combination for the problem of interest.


Water models

Overview

Teaching: 15 min
Exercises: 0 min
Questions
  • Why do we want to use water in out simulations?

  • What water models are commonly used?

  • How to chose a model of water for a simulation?

Objectives
  • Learn how water molecules are modeled in simulations

  • Understand how choice of water model can affect a simulation

Introduction

Realistic water environment is essential for accurate simulation of biological molecules.

Two approaches to solvation:

  1. Add explicit water molecules.
  2. Treat water as a continuous medium instead of individual molecules.

Continuum models

Most molecular dynamics simulations are carried out with explicit water molecules.

Explicit models

An early water model.

TIP3P water model

How are water models derived?

A good water model must faithfully reproduce six bulk properties of water:

Many water models with different level of complexity (the number of interaction points) have been developed. We will discuss only the models most widely used in simulations, and refer you to the excellent article in Wikipedia for an overview of all water models.

3-charge 3-point models.

   
TIP3P (transferable intermolecular potential)

$\circ$ Rigid geometry closely matching actual water molecule.
Water Models
SPC/E (simple point charge)

$\circ$ More obtuse tetrahedral angle of 109.47°.
$\circ$ Adds an average polarization correction to the potential energy function.
$\circ$ Reproduces density and diffusion constant better than the original SPC model.
Water Models

3-charge 4-point models.

   
TIP4P-Ew

$\circ$ Improves association/dissociation balance compared to 3-point models.
Water Models

Water models have their limitations.


Challenges in developing water models.

Charge distribution of the water molecule

Optimal Point Charges (OPC and OPC3)

Water Models

Polarizable water model OPC3-pol

Quality of different water models

Quality scores of water models

The distribution of quality scores for different water models in the space of dipole (μ) and quadruple (QT) moments. Figures from [2].

Performance Considerations

Other things to consider

Force Field Parameters of the common Water Models

  TIP3P SPC/E TIP4P-Ew OPC OPC3
OH 0.9572 1.0 0.9572 0.8724 0.9789
HH 1.5136 1.63 1.5136 1.3712 1.5985
HOH 104.52 109.47 104.52 103.6 109.47
OM - - 0.125 0.1594 -
A(12) 582.0 629.4 656.1 865.1 667.9
B(6) 595.0 625.5 653.5 858.1  
qO −0.834 −0.8476 −1.04844 −1.3582 -0.8952
qH +0.417 +0.4238 +0.52422 +0.6791 +0.4476

Nonbonded OPC3: sigma=1.7814990, epsilon=0.163406

References

  1. Structure and Dynamics of the TIP3P, SPC, and SPC/E Water Models at 298 K
  2. Building Water Models: A Different Approach
  3. Effect of the Water Model in Simulations of Protein–Protein Recognition and Association
  4. Accuracy limit of rigid 3-point water models
  5. A fast polarizable water model for atomistic simulations

Key Points

  • Continuum models cannot reproduce the microscopic details of the protein–water interface

  • Water–water interactions dominate the computational cost of simulations

  • Good water model needs to be fast and accurate in reproduction of the bulk properties of water


End of Workshop

Overview

Teaching: min
Exercises: min
Questions
Objectives

Key Points


Supplemental: Overview of the Common Force Fields

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • What are the categories of molecular dynamics force fields?

  • What force fields are available?

  • What systems they work best with?

Objectives
  • Be able to recognize the strengths and weaknesses of different types of force fields.

  • Find out which force fields are available and which systems they are most suitable for.

  • Identify the original papers that introduced force fields.

Early Force Fields

CFF (Consistent Force Field) (1968)

CFF was the first modern force field (Lifson, 1968) Introduced a concept of a ‘consistent force field’. Introduced a methodology for deriving and validating force fields. The term ‘consistent’ emphasized importance of the ability to describe a wide range of compounds and physical observables (conformation, crystal structure, thermodynamic properties and vibrational spectra). After the initial derivation for hydrocarbons CFF was extended to proteins, but was very crude at that time.

Allinger Force Fields (MM1 - MM4) (1976-1996)

Target data included electron diffraction, vibrational spectra, heats of formation, and crystal structures. The calculation results were verified via comparison with high-level ab-initio quantum chemistry computations, and the parameters were additionally adjusted. Intended for calculations on small and medium size organic molecules.

ECEPP (Empirical Conformational Energy Program for Peptides) (1975)

ECEPP was the first force field targeting polypeptides and proteins. Crystal data of small organic compounds and semi-empirical QM calculations were used extensively in derivation of this force field. As more experimental data became available, force field parameters have been refined and modified versions were published.

Evolution of Force Fields

Early force field advances

Early force field development focused on developing mathematical forms for MM energy function and methods of deriving parameters. Researchers investigated various forms of potential energy functions, and experimented with hydrogen bonding potential, combination rules, and out of plane angle potentials during this period.

United atoms force fields.

United Atoms Model was developed to speed up large-scale simulations. It represents nonpolar carbons and their bonded hydrogens as a single particle. United Atoms force fields can significantly reduce the size of most simulations, since roughly half of the atoms in biological or other organic macromolecules are hydrogens. Additional advantage is the efficiency gain in conformational sampling. The first united atoms force field was UNICEPP (L.G. Dunfield, 1978).

According to early comparisons between all-atom and united-atom simulations, united-atom force fields adequately represent molecular vibrations and bulk properties of small molecules.

After this initial success all major developers of protein force fields implemented united atoms models:

It became apparent, however, that there were some limitations:

New approaches were found to overcome the limitations of united-atom force fields. For example, only aliphatic hydrogens, which are not significantly charged and do not participate in hydrogen bonds, are represented as united atoms while other hydrogens are represented explicitly. In this way, the limitations of the united-atom force field are partially mitigated while preserving most of the benefits of the united-atom force field.

Refinement after the initial introduction.

Statistical errors caused by relatively short simulation lengths and systematic errors caused by inaccurate force fields limit the predictive power of MD simulations. Deficiencies of the force fields remained undetected when statistical errors caused by insufficient sampling prevailed. Increase of the computing power over last two decades allowed for much longer simulations and resulted in a significant reduction of statistical errors. This led to the detection of force field deficiencies such as large deviations in different observables and inability to predict conformations of proteins and peptides. Various approaches were undertaken to improve force fields such as:

Recent developments in force fields and future prospects

Despite extensive attempts to improve force files, they have often failed to achieve quantitative accuracy. There was a realization that the functional form of potential energy was the major problem with all widely used protein force fields.

Two strategies can be used to address this problem:

  1. Expand and improve the rigor of the representation of the underlying physics.
  2. Develop empirical corrections to compensate for deficiency of physical representation

AMBER, CHARMM, and OPLS focused their efforts on empirical correction of the simple potential function. AMOEBA and COMPASS worked on improving the functional form.

It is well understood that charge distributions are affected by both chemical environments and local geometry changes. As we discussed above, the former is explicitly treated in polarizable force field models. AMOEBA+ (2019) improved representation of electrostatic interactions by incorporating charge penetration and intermolecular charge transfer.

Dependence of atomic charges on the local molecular geometry is ignored by almost all classical force fields even though it is well known that it causes issues.

One of the few force fields that consider CF contributions is SDFF:

https://eurekamag.com/research/011/176/011176666.php

Palmo, K.; Mannfors, B.; Mirkin, N. G.; Krimm, S. Inclusion of charge and polarizability fluxes provides needed physical accuracy in molecular mechanics force fields. Chem. Phys. Lett. 2006, 429 (4), 628.

Recently Geometry-Dependent Charge Flux (GDCF) model considering charge flux contributions along bond and angle was introduced in AMOEBA+(CF) force field.

Implementation of Geometry-Dependent Charge Flux into the Polarizable AMOEBA+ Potential

Force Fields Aimed at Improving Quality of Molecular Interactions

CVFF (Consistent Valence Force Field) (Maple & Hagler, 1988)

CFF93 (An ab initio all-atom force field for polycarbonates) (Sun et al., 1994)

CFF, formerly CFF95 (Jonsdottir & Rasmussen, 2000)

MM3 (N.L.Allinger et al., 1989), MM4 (N.L.Allinger et al., 1996)

COMPASS (Condensed-phase Optimized Molecular Potentials for Atomistic Simulation Studies)

COMPASS (H. Sun, 1998)

Developed for simulations of organic molecules, inorganic small molecules, and polymers

COMPASS II (H. Sun et al., 2016)

Extended the coverage to polymer and drug-like molecules found in popular databases. The VDW parameters are obtained by fitting enthalpies of vaporization and densities, to experimental data. The atomic partial charges are derived using QM and empirically adjusted to take hydrogen bonding effects into account. The COMPASS energy function offers six types of cross-terms: bond-bond, bond-angle, angle-angle, bond-torsion, angle-torsion, and angle-torsion-angle.

Biomolecular Force Fields for Large-Scale Simulations.

Long simulations of large systems are required to study biologically relevant processes. It takes a lot of computer power to run such simulations. Due to this, the main objective is to develop a minimal force field that allows simulation sizes and time periods to be extended as much as possible while still keeping chemical structures, interaction energies, and thermodynamic properties within an acceptable range.

The workhorses of modern biomolecular simulations are all-atom, fixed-charge force fields:

CHARMM and AMBER forcefields are developed for simulations of proteins and nucleic acids, and they focus on accurate description of structures and non-bonded energies. GROMOS and OPLS are geared toward accurate description of thermodynamic properties such as heats of vaporization, liquid densities, and molecular solvation properties.

AMBER

AMBER forcefields are developed for simulations of proteins and nucleic acids and they are focused on accurate description of structures and non-bonded energies. The VDW parameters are obtained from crystal structures and lattice energies. The atomic partial charges are fitted to QM electrostatic potential without any empirical adjustments.

ff99 (1999)

The main idea is that the use of RESP charges, should lead to the need for fewer torsional potentials than in models with an empirical charge derivation. Potential energy surface scans were performed using four different ab initio methods, HF/6‐31G*, MP2/6‐31G*, MP2/6‐311+G (2d,p), and B3LYP/6‐311+G (2d,p).

Wang J, Cieplak P, Kollman PA. How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules? Journal of Computational Chemistry. 2000;21: 1049–1074. doi:10.1002/1096-987X(200009)21:12<1049::AID-JCC3>3.0.CO;2-F

After publication of ff99 a number of studies devoted primarily to modifying the torsion potentials in order to correct the observed discrepancies have been published:

ff03 (2003)

Completely new charge set developed using B3LYP/cc-pvTZ quantum calculations in a polarizable continuum (PCM) solvent intended to mimic the interior of a protein. Backbone torsion Fourier series were derived specifically for this new charge set at the MP2/cc-pvTZ level of theory, also in the context of PCM solvent. MM gas-phase energies computed with charges derived in PCM solvent have been shown to double-count polarization effects, and ff03 force field has not become as widely used or further refined as ff99

Duan, Yong, Chun Wu, Shibasish Chowdhury, Mathew C. Lee, Guoming Xiong, Wei Zhang, Rong Yang, et al. “A point-charge force field for molecular mechanics simulations of proteins based on condensed-phase quantum mechanical calculations.” Journal of Computational Chemistry 24, no. 16 (2003): 1999–2012. https://doi.org/10.1002/jcc.10349.

ff99sb (2006)

Optimized for the correct description of the helix-coil equilibrium

ff99SB-φ’

Targeted the reproduction of the intrinsic conformational preferences of tripeptides

ff99SBnmr (2010) and ff99SB_φψ

Target data included protein NMR chemical shifts and residual dipolar couplings.

ff99SBildn (2010)

Targeted optimization of four amino acid side chains.

Lindorff‐Larsen, Kresten, Stefano Piana, Kim Palmo, Paul Maragakis, John L. Klepeis, Ron O. Dror, and David E. Shaw. “Improved Side-Chain Torsion Potentials for the Amber Ff99SB Protein Force Field.” Proteins: Structure, Function, and Bioinformatics 78, no. 8 (2010): 1950–58. doi:10.1002/prot.22711.

ff12SB (2012)

This is the preliminary version of ff14SB described in the same paper.

ff12SB-cMAP (2015)

Force field for implicit-solvent simulations.

ff99IDPs (2015)

Force field for intrinsically disordered proteins

ff14ipq (2014)

This force field is derived using another new approach aiming to derive charges implicitly polarized by the fixed charge explicit water (IPolQ method). The charges are derived in self-consistent manner in presence of explicit water molecules represented by TIP4P-Ew water model at MP2/cc-pV(T+d)Z level. The weak point of the ff14ipq force field is overstabilization of salt bridges.

Cerutti, David S., William C. Swope, Julia E. Rice, and David A. Case. Ff14ipq: A Self-Consistent Force Field for Condensed-Phase Simulations of Proteins. Journal of Chemical Theory and Computation 10, no. 10 (October 14, 2014): 4515–34. doi:10.1021/ct500643c.

ff14SB (2015)

Another attempt to corect for deficiencies of ff99SB by using new side chain dihedral parameters based on MP2 level calculations followed by adjustment to the backbone φ energy profile. Used the old ff99SB charge set.

Maier, James A., Carmenza Martinez, Koushik Kasavajhala, Lauren Wickstrom, Kevin E. Hauser, and Carlos Simmerling. “Ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from Ff99SB.” Journal of Chemical Theory and Computation 11, no. 8 (August 11, 2015): 3696–3713. https://doi.org/10.1021/acs.jctc.5b00255.

ff15ipq (2016)

The motivation for the development was to address the salt bridge overstabilization problem of ff14ipq. However, this forcefiled is not a small correction applied to ff14ipq. It is is a complete semi-automatic rederivation of all parameters with the different water model SPC/Eb. The salt bridge overstabilization issue was addressed by increasing radius of polar hydrogens bonded to nitrogen. The modified FF performed as well or better than the other fixed charge force fields. Polarizable CHARMM Drude-2013 and AMOEBA performed better in this respect, as expected.

Problems related to protein stability persist. Even 4 μs simulations “were not sufficiently long to obtain converged estimates of secondary structure of polypeptides”. In simulation tests some proteins deviated significantly near the end of several microsecond simulations, and it is not clear whether this is a transient fluctuation or transition to a different state.

Debiec, Karl T., David S. Cerutti, Lewis R. Baker, Angela M. Gronenborn, David A. Case, and Lillian T. Chong. Further along the Road Less Traveled: AMBER Ff15ipq, an Original Protein Force Field Built on a Self-Consistent Physical Model. Journal of Chemical Theory and Computation 12, no. 8 (August 9, 2016): 3926–47. doi:10.1021/acs.jctc.6b00567

ff15FB (2017)

The goal was to reoptimize intramolecular bond, angle, and dihedral parameters to fit MP2 level QM data without modifying the nonbonded parameters. The parameter optimization was done using ForceBalance package. Same peformance as the earlier versions for equilibrium properties, improved performance for temperature dependence. For best agreement with experiment recommended to use with the TIP3P-FB water. The TIP3P-FB model was parametrized to reproduce the temperature and pressure dependence of a wide range of thermodynamic properties.

ff19SB (2020)

Backbone dihedral parameters optimized using as reference data the entire 2D quantum mechanics (QM) energy surface. Both QM and MM calculations were done in aqueous solution. AMBER ff19SB uses CMAP torsional potentials. The authors concluded that “ff19SB, when combined with a more accurate water model such as OPC, should have better predictive power for modeling sequence-specific behavior, protein mutations, and also rational protein design”.

CHARMM

CHARMM19 (1996)

United-atom model, originally released in 1985.

CHARMM22 (1998)

All-atom model.

CHARMM27 (2000)

Update to the CHARMM22 featuring the optimization of nucleic acid and lipid parameters, as well as the introduction of a number of new ions.

CHARMM22/CMAP (2004)

This forcefield introduces a tabulated correction for the φ-, ψ-angular dependence of the potential energy. As a result of the application of CMAP correction, the dynamical and structural properties of proteins were significantly improved in molecular dynamics simulations.

CHARMM27r (2005)

Improved lipid parameter set.

CHARMM35 (2008)

Carbohydrate parameter set.

CGenFF (2009)
CHARMM36 (2012)

refined backbone CMAP potentials and introduced new side-chain dihedral parameters. The updated CMAP corrected the C22/CMAP FF bias towards alpha-helical conformations.

Review
CHARMM Force Field Files

GROMOS

53A5, 53A6 (Oostenbrink et al., 2004)
54A7, 54B7 (N.Schmid et al., 2011)
54A8 (M.M.Reif et al., 2012)

Parameter sets for GROMOS force fields are specified in accordance with the following format:

OPLS

Force fields of the OPLS family are designed for simulating liquids that contain organic molecules and proteins. The VDW parameters are optimized using experimental liquid properties, mainly enthalpies of vaporization and densities. The atomic partial charges are derived using QM and experimental condensed-phase properties. An important part of the OPLS philosophy is balancing solvent-solvent and solute-solvent interactions.

OPLS-AA (1996)

This is the first all-atom OPLS force field. Bond stretching and angle bending parameters are taken from the AMBER force field. The torsional parameters were fit to the RHF/6-31G* calculations of about 50 organic molecules and ions. The charges are empirical and have been obtained from fitting to reproduce properties of organic liquids.

Jorgensen W, Maxwell D, Tirado-Rives J. Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. J Am Chem Soc. 1996;118: 11225–11236.

OPLS-AA/L (2001)

Large data set, more than 2000 data points of energies for the 20 amino acids based on geometry optimization at the HF/6-31G** level followed by single-point LMP2/cc-pVTZ(-f) calculations. This level of theory is accurate within 0.25 kcal/mol.

Kaminski GA, Friesner RA, Tirado-Rives J, Jorgensen WL. Evaluation and Reparametrization of the OPLS-AA Force Field for Proteins via Comparison with Accurate Quantum Chemical Calculations on Peptides. J Phys Chem B. 2001;105: 6474–6487. doi:10.1021/jp003919d

OPLS_2005 (2005)

Main goal - to broaden the coverage of OPLS_2001 and refine torsion parameters using a larger dataset. The new Data set included torsion profiles from 637 compounds.

Banks JL, Beard HS, Cao Y, Cho AE, Damm W, Farid R, et al. Integrated Modeling Program, Applied Chemical Theory (IMPACT). J Comput Chem. 2005;26: 1752–1780. doi:10.1002/jcc.20292

OPLS-AAx & OPLS/CM1Ax (2012)

The representation of chlorine, bromine, and iodine in aryl halides has been modified in the OPLS-AA and OPLS/CM1A force fields in order to incorporate halogen bonding. It was accomplished by adding a partial positive charge in the region of the σ-hole.

OPLS2.0 (2012)

OPLS2 was developed to improve the accuracy of drug-like molecules. Substantially expanded data set contained QM data on more than 11,000 molecules. CM1A-BCC charges were used. The BCC terms were parameterized against the OPLS-AA charges for a core set of 112 molecules and the electrostatic potential at the HF/6-31G* level. The BCC terms were empirically adjusted to minimize the errors with regard to the ASFE using a training set of 153 molecules.

Shivakumar D, Harder E, Damm W, Friesner RA, Sherman W. Improving the Prediction of Absolute Solvation Free Energies Using the Next Generation OPLS Force Field. J Chem Theory Comput. 2012;8: 2553–2558. doi:10.1021/ct300203w

OPLS-AA/M (2015)

New torsion parameters using higher level ωB97X-D(23)/6-311++G(d,p) and B2PLYP-D3BJ/aug-cc-pVTZ(26) QM calculations.

Robertson MJ, Tirado-Rives J, Jorgensen WL. Improved Peptide and Protein Torsional Energetics with the OPLS-AA Force Field. J Chem Theory Comput. 2015;11: 3499–3509. doi:10.1021/acs.jctc.5b00356

OPLS3 (2016)

Added off-atom charge sites to represent halogen bonding and aryl nitrogen lone pairs. Complete refit of peptide dihedral parameters using an order of magnitude more data. Claimed 30% improvement. Still the same original VDW parameters.

Harder E, Damm W, Maple J, Wu C, Reboul M, Xiang JY, et al. OPLS3: A Force Field Providing Broad Coverage of Drug-like Small Molecules and Proteins. J Chem Theory Comput. 2016;12: 281–296. doi:10.1021/acs.jctc.5b00864

Polarizable Force fields

AMBER

ff02pol (P. Cieplak et al., 2001)
ff12pol (J. Wang et al., 2011)

AMOEBA (Atomic Multipole Optimized Energetics for Biomolecular Applications)

AMOEBA-2002 (Ren and Ponder, 2002)
AMOEBA-2013 (Shi et al., 2013)

AMOEBA-2013 uses permanent electrostatic multipole (dipole and quadrupole) moments at each atom and explicitly treats polarization effects under various chemical and physical conditions.

AMOEBA+ (C. Liu et al., 2019)
AMOEBA+(CF) (C. Liu et al., 2020)
Q-AMOEBA (N.Mauger et al., 2022)

OPLS

OPLS/PFF https://onlinelibrary.wiley.com/doi/10.1002/jcc.10125

Force field for water with polarization capabilities. Intended for simulations that explicitly take nuclear quantum effects into account.

CHARMM

CHARMM fluctuating charge model (S.Patel et al., 2004)
CHARMM Drude model (P.E.M. Lopez, 2013)
CHARMM Drude-19 F.-Y. Lin, 2020
Drude parameters for DPPC J. Chowdhary, 2013
Drude parameters for DOPC and POPE H. Chu, 2017

Force Fields for Small Molecules

GAFF
CGenFF
  1. (F.-U.Lin & AD. MacKerell, 2020) Force Fields for Small Molecules

Additional Reading

  1. (Dauber-Osguthorpe, 2019) Biomolecular force fields: where have we been, where are we now, where do we need to go and how do we get there? - Review of the origins of FF based calculations, theory and methodology of FF development.

  2. (Hagler, 2019) Force field development phase II: Relaxation of physics-based criteria… or inclusion of more rigorous physics into the representation of molecular energetics. - The latest developments in improvement of FF accuracy and robustness are discussed.

  3. Tinker-HP is a multi-CPUs and multi-GPUs/multi-precision, MPI massively parallel package dedicated to long molecular dynamics simulations with classical and polarizable force fields, neural networks and advanced QM/MM.

  4. K. Vanommeslaeghe, A.D. MacKerell Jr., 2015. Review of additive and polarizable CHARMM force fields for biophysics and computer-aided drug design.

Key Points

  • There are different types of force fields designed for different types of simulations.

  • Induction effects are not accounted for by fixed-charge force fields.

  • Using more accurate and diverse target data allows MD force fields to be improved.