Running Molecular Dynamics Simulations with AMBER

Overview

Teaching: 30 min
Exercises: 5 min
Questions
  • What simulation programs are available in the AMBER package?

  • How to minimize energy?

  • How to heat up a simulation system?

  • How to equilibrate a simulation system?

Objectives
  • Learn how to minimize energy of a system.

  • Learn how to heat up and equilibrate a simulation.

AMBER MD engines.

Amber package includes two MD engines: SANDER and PMEMD. Both programs are available in serial and parallel versions.

SANDER

SANDER is a free simulation engine distributed with the AmberTools package. For parallel distributed simulations, it uses the MPI (message passing interface). The parallel version of Sander implements replicated data structure.

Each CPU computes a portion of the potential energy and corresponding gradients for a set of atoms assigned to it. A global part of the code then sums the force vector and sends the result to each CPU. The processors then perform a molecular dynamics update step for the assigned atoms and communicate the updated positions to all CPUs in preparation for the subsequent molecular dynamics step.

This model provides a convenient programming environment, but the main problem is that the communication required at each step grows with the number of processors limiting parallel scaling.

PMEMD

PMEMD is an extensively revised version of SANDER available only in the commercial AMBER package. Developers made many optimizations to improve both single-processor performance and parallel scaling. To avoid data transfer bottleneck, PMEMD communicates to each processor only the coordinate information necessary for computing the pieces of the potential energy assigned to it. This code, however, does not support all of the options found in the SANDER.

GPU-Accelerated PMEMD

GPU - accelerated PMEMD version of PMEMD (pmemd.cuda) uses NVIDIA GPUs to perform MD simulations. It is significantly faster than the CPU version achieving high simulation speed by executing all calculations on a single GPU within its memory. This approach eliminates the bottleneck of moving data between CPU and GPU and allows very efficient GPU usage.

Modern GPUs are so fast that communication overhead between GPUs does not allow for efficient parallel scaling of an MD simulation to two or more GPUs.

While you can run a single simulation on several GPUs using the parallel PMEMD GPU version (pmemd.cuda.MPI) it will run not run much faster than on a single GPU. Parallel GPU version is useful only for specific simulations such as thermodynamic integration and replica-exchange MD. These types of jobs involve several completely independent simulations that can be executed concurrently on different GPUs. PMEMD allows running multiple copies of simulations within a single parallel run via the multi-pmemd mechanism described below.

PMEMD parallel scaling, A100
PMEMD parallel scaling, P100

Multi-sander and multi-pmemd simulations

Multi-sander and multi-pmemd are wrappers around parallel versions of these programs. These wrappers are invoked by preparing special input files. The wrappers allow running multiple copies of simulations within a single parallel job. The multi-sander and multi-pmemd mechanisms are also utilized for methods requiring multiple simulations to communicate with one another, such as thermodynamic integration and replica exchange molecular dynamics.


Summary of available AMBER MD executables:

Verion SANDER PMEMD  
Serial sander pmemd  
Parallel sander.MPI pmemd.MPI  
Single GPU, default link - pmemd.cuda -> pmemd.cuda_SPFP  
Single GPU, single precision - pmemd.cuda_SPFP  
Single GPU, double precision - pmemd.cuda_DPFP  
Multi GPU, default link - pmemd.cuda.MPI -> pmemd.cuda_SPFP.MPI  
Multi GPU, single precision - pmemd.cuda_SPFP.MPI  
Multi GPU, double precision - pmemd.cuda_DPFP.MPI  

Energy minimization.

Before simulating a system we need to relax it. Any atomic clashes must be resolved, and potential energy minimized to avoid unphysically large forces that can crash a simulation.

The general minimization strategy is first to restrict all solute atoms with the experimental coordinates and relax all atoms that were added. (solvent, ions and missing fragments). This will help to stabilize the native conformation. There are no strict rules defining how many minimization steps are necessary. The choice will depend on the composition of a simulation system. For a big systems with a large amount of missing residues it is safer to carry out several minimization steps gradually releasing restraints. For example, you can first relax only solvent and ions, then lipid bilayer (if this is a membrane protein), then added fragments, then the original protein side-chains. Having more steps may be unnecessary, but it will not cause any problems.

For example, we could do a two stage minimization. In the first stage we restrain all original atoms. In the second stage we restrain only the original backbone atoms. Our example protein is very small and we have limited time, so we skip the first step and restrain only protein backbone.

cd ~/scratch/workshop_amber/example_04/1_minimization

MD programs can do a lot of different things, and every type of calculation has a number of parameters that allow us to control what will be done. To run a minimization we need to make an input file describing exactly what we want to do and how we want to do it:

Input file for minimization describes what we want to do and how.

In the input file we:

AMBER MD programs read simulation parameters from an input file. Simulation parameters in AMBER are called “FLAGS”. The Table below lists some important minimization FLAGS.

AMBER minimization parameters

Flag Value Description
imin 1 Turn on minimization
ntmin 0,1,2,3 Flag for the method of minimization
maxcyc integer The maximum number of cycles of minimization
ncyc integer If NTMIN=1 switch from SD to CG after NCYC cycles
ntpr integer n Print energies every n steps
ntr 1 Use cartesian harmonic restraints
restraint_wt float Restraint force kcal/mol
restraintmask ambermask Specifies restrained atoms

Methods of minimization

0 Steepest descent+conjugate gradient. The first 4 cycles are steepest descent at the start of the run and after every non-bonded pair-list update.
1 For NCYC cycles the steepest descent method is used then conjugate gradient is switched on.
2 Steepest descent only
3 XMIN family methods. The default is LBFGS (Limited-memory Broyden-Fletcher-Goldfarb-Shanno). It is a popular algorithm in machine learning. The method incrementally learns from previous steps, so that it can make the next step more accurate. It converges considerably faster than CG, but requires more memory.

Minimization input file min.in:

Energy minimization 
&cntrl 
imin=1, ntmin=0, maxcyc=1000,   ! Minimization, method, number of cycles 
ntpr=5,                         ! Print energies every ntpr steps
ntr=1,                          ! Use harmonic cartesian restraints   
restraint_wt=10.0,              ! Restraint force kcal/mol
restraintmask="(:1-96)&(@CA,N,O)",
&end
END

Submission script submit.sh:

#!/bin/bash
#SBATCH --mem-per-cpu=1000M
#SBATCH --time=3:00:00
#SBATCH --ntasks=4
module --force purge
module load StdEnv/2023 amber/22
srun pmemd.MPI -O -i min.in \
        -p ../1RGG_chain_A.parm7 \
        -c ../1RGG_chain_A.rst7 \
        -ref ../1RGG_chain_A.rst7 \
        -r minimized.nc -o mdout

This job runs about 30 sec on 4 CPUs.

If minimization is successful we expect to see large negative energies.

Heating

cd ~/scratch/workshop_amber/example_04/2_heating

Molecular dynamics parameters

Flag Value Description
dt 0.001 Time step, ps. Default 0.001
ntf 2 Force evaluation. Omit interactions involving bonds with H-atoms. Default 1 (complete interaction)
ntc 2 Flag for SHAKE. Bonds involving hydrogen are constrained.
ntt 1 Constant temperature, using the Berendsen weak-coupling algorithm.
tempi 150 Initial temperature. The velocities are assigned from a Maxwellian distribution at TEMPI
temp0 300 Reference temperature at which the system is to be kept
tautp 1 Time constant, in ps, for heat bath coupling, default is 1 ps.
ntp 1 Flag for constant pressure dynamics. 1 - MD with isotropic position scaling
barostat 1 Berendsen (default)
pres0 1 Reference pressure, default 1 bar
taup 4 Pressure relaxation time (in ps), default 1
ntwx 1000 Every ntwx steps, the coordinates will be written to the mdcrd file
ntpr 100 Print energies in the log every 100 steps, default 50

In the examples bonds with hydrogens are not constrained and the default timestep of 1 fs in used. To turn on SHAKE use ntf=2 and ntc=2.

Heating input file:

Heating 
&cntrl 
ntt=1,                                  ! Use Berendsen thermostat
tempi=150,temp0=300,tautp=1,            ! Initial and reference temperature, time constant
ntp=0,                                  ! No barostat
ntpr=100,                               ! Print energies every ntpr steps
ntwx=1000,                              ! Write coordinates every ntws steps
nstlim=10000,                           ! Simulate nstlim steps
ntr=1,                                  ! Use harmonic cartesian restraints 
restraint_wt=10,                        ! Restraint force kcal/mol
restraintmask="(:1-96)&(@CA,N,O)",
&end
END

Heating submission script:

#!/bin/bash
#SBATCH --mem-per-cpu=1000M
#SBATCH --time=3:00:00
#SBATCH --ntasks=4
module --force purge
module load StdEnv/2023 amber/22
srun pmemd.MPI -O -i heat.in \
        -p ../1RGG_chain_A.parm7 \
        -c ../1_minimization/minimized.nc \
        -ref ../1RGG_chain_A.rst7 \
        -r heated.nc -o mdout

This job runs about 2 min on 4 CPUs.

Equilibration

cd ~/scratch/workshop_amber/example_04/3_equilibration

Constrained equilibration

Flag Value Description
ntx 5 Coordinates and velocities will be read from a restart file
irest 1 Restart simulations

Input file equilibrate_1.in:

&cntrl 
irest=1,ntx=5,                      ! Restart using coordinates and velocities
ntt=1,temp0=300,tautp=1,            ! Use Berendsen thermostat
ntp=1,barostat=1,pres0=1,taup=1,    ! Use Berendsen barostat
ntpr=100,                           ! Print energies every ntpr steps   
ntwx=1000,                          ! Save coordinates every ntwx steps
nstlim=5000,                        ! Simulate nstlim steps
ntr=1,                              ! Turn on restraints
restraint_wt=10,                    ! Restraint force, kcal/mol
restraintmask="(:1-96)&(@CA,N,O)",
&end
END

Submission script for CPU-only job submit_1.sh:

#!/bin/bash
#SBATCH --mem-per-cpu=4000M
#SBATCH --time=3:00:00
#SBATCH --ntasks=4
module --force purge
module load StdEnv/2023 amber/22
srun pmemd.MPI -O -i equilibrate_1.in \
        -p prmtop \
        -c heated.nc \
        -ref inpcrd \
        -r equilibrated_1.nc \
        -o equilibration_1.log

This run takes about 3 minutes on 4 CPUs.

Submission script for GPU job submit_1_cuda.sh:

#!/bin/bash
#SBATCH --mem-per-cpu=1000M
#SBATCH --time=3:00:00
#SBATCH --ntasks=1
#SBATCH --gpus-per-node=1
module --force purge
# cuda/12.2 is incompatible with the current vGPU driver
module load arch/avx2 StdEnv/2020 gcc/9.3.0 cuda/11.4 openmpi/4.0.3 amber/20.12-20.15 
pmemd.cuda -O -i equilibrate_1.in \
        -p ../1RGG_chain_A.parm7 \
        -c ../2_heating/heated.nc \
        -ref ../1RGG_chain_A.rst7 \
        -r equilibrated_1.nc \
        -o equilibration_1.out

This job runs about 30 sec on 1 vGPU.

Unconstrained equilibration

Input file equilibrate_2.in:

Equilibration, Langevin thermostat + Berendsen barostat
&cntrl 
irest=1,ntx=5,                     ! Restart using coordinates and velocities
ntt=3,temp0=300,gamma_ln=1.0,      ! Langevin thermostat, collision frequency
ntp=1,barostat=1,pres0=1,taup=1.0, ! Berendsen barostat
ntpr=100,                          ! Print energies every ntpr steps   
ntwx=1000,                         ! Save coordinates every ntwx steps
nstlim=1000000,                    ! Simulate nstlim steps
&end
END

Submission script submit_2.sh

#!/bin/bash
#SBATCH --mem-per-cpu=1000M
#SBATCH --time=3:00:00
#SBATCH --ntasks=1
#SBATCH --gpus-per-node=1
module --force purge
module load arch/avx2 StdEnv/2020 gcc/9.3.0 cuda/11.4 openmpi/4.0.3 amber/20.12-20.15 
pmemd.cuda -O -i equilibrate_2.in \
        -p ../1RGG_chain_A.parm7 \
        -c equilibrated_1.nc \
        -r equilibrated_2.nc \
        -x mdcrd_2.nc \
        -o equilibration_2.out

This job runs about 10 min on 1 vGPU.

References

  1. An overview of the Amber biomolecular simulation package
  2. Running simulations with GPU acceleration
  3. Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born
  4. Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 2. Explicit Solvent Particle Mesh Ewald

Analyzing simulation logs

Extract selected energy components from MD log and save in a table using cpptraj.

Use the script ~/scratch/workshop_amber/scripts/extract_energies.sh:

mkdir ~/bin
cp ~/scratch/workshop_amber/scripts/* ~/bin

The contents of the script ~/bin/extract_energies.sh:

#!/bin/bash
echo "Usage: extract_energies simulation_log_file" 

log=$1
cpptraj << EOF
readdata $log
writedata energy.dat $log[Etot] $log[TEMP] $log[PRESS] $log[VOLUME] time 0.1
EOF

Extract selected energy components.

cd ~/scratch/workshop_amber/example_04/3_equilibration
module purge
module load StdEnv/2023 amber
~/bin/extract_energies.sh equilibration_2.log

Plot energy components with python

Go to Jupyter desktop

cd ~/scratch/workshop_amber/example_04/3_equilibration
module purge
module load StdEnv/2023 amber

Read table from the file energies.dat into pandas dataframe and plot it:

python ~/bin/plot_energies.py

File plot_energies.py:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_table('energy.dat', delim_whitespace=True)
df.columns=["Time", "Etot", "Temp", "Press", "Volume"]

df.plot(subplots=True, x="Time", figsize=(6, 8))
plt.legend(loc='best')
plt.savefig('energies.png')
# plt.show()

Managing trajectories

You can remove from a trajectory all components that are not essential for analysis, for example water and ions. The following command will remove everything except residues from 1 to 96 and save every second frame.

cpptraj<<EOF
parm ../1RGG_chain_A.parm7
trajin mdcrd.nc 1 last 2
strip !(:1-96)
trajout mdcrd_nowat.nc 
run
EOF
cpptraj<<EOF
parm ../1RGG_chain_A.parm7
parmstrip !(:1-96)
parmwrite out prmtop_nowat.parm7
run
EOF

Transferring equilibrated system between simulation packages.

Simulation packages have different methods and performance. It is useful to be able to transfer a running simulation from one software to another.

Imagine that you started your project with GROMACS, but later realized that you need to run a constant pH simulation. You need to switch to AMBER. Want to study conformational transitions? Gaussian accelerated MD is not available in GROMACS. Another reason to move to AMBER/NAMD. Want to apply custom forces? It is easy to with Tcl scripting in NAMD.

Moving simulation from AMBER to GROMACS.

To transfer simulation to GROMACS we need to convert topology and restart files.

cd ~/scratch/workshop_amber/pdb/1RGG/AMBER_to_GROMACS

Convert AMBER topology to GROMACS

module purge
module load StdEnv/2023 amber gromacs
import parmed as pmd
amber = pmd.load_file("prmtop.parm7", "inpcrd.rst7")
amber.save('topol.top')
amber.save('inpcrd.gro')

Make index file

The default index groups are OK:

gmx make_ndx -f inpcrd.gro <<EOF
q
EOF

Generate positional restraints file for the protein backbone.

gmx genrestr -f inpcrd.gro -fc 500.0 -n index.ndx -o backbone.itp << EOF
Backbone
EOF

Add definitions of the position restraints to the topology “topol.top”. Use a text editor of your choice to insert the following lines at the end of the “system” molecule block:

#ifdef POSRES
#include "backbone.itp"
#endif

[ moleculetype ]
; Name            nrexcl

Define position restraints in the input file min.mdp:

; Turn on position restraints
define = -D_POSRES

Convert AMBER restart to GROMACS restart.

import parmed as pmd
amber = pmd.load_file("prmtop.parm7", "rest.rst7")
ncrest=pmd.amber.Rst7("rest.rst7")
amber.velocities=ncrest.vels
amber.save("restart.gro")

Create portable binary restart (topol.tpr) file

module purge
# gromacs modules from StdEnv/2023 fail
module load StdEnv/2020 gcc/9.3.0 openmpi/4.0.3 gromacs/2022.3
gmx grompp -p topol.top -c restart.gro -f gromacs_production.mdp -maxwarn 2

The workshop data contains an example gromacs_production.mdp in the directory workshop_amber/pdb/1RGG/AMBER_to_GROMACS.

Simulation with NAMD

Because NAMD natively supports AMBER topology files, simulating a system prepared with AMBER tools requires only NAMD simulation input files and NAMD - compatible coordinate files such as pdb or NAMD binary coordinates.

In the worksop data, you will find example simulation input files for minimization, heating and equilibration:

ls ~/scratch/workshop_amber/namd/sim_namd
1-minimization  2-heating  3-equilibration  4-production

Key Points