Hands-on 6: A Comparative Performance Ranking of the Molecular Dynamics Software
Overview
Teaching: 30 min
Exercises: 5 minQuestions
How to evaluate CPU efficiency of a simulation?
How fast and efficient my simulation will run with different programs and computing resources?
Objectives
Learn how to request the right computational resources
Requesting the right computational resources is essential for fast and efficient simulations. Submitting a simulation with more CPUs does not necessarily mean that it will run faster. In some cases, a simulation will run slower with more CPUs. There is also a choice between using CPU or GPU versions. When deciding on the number of CPUs, it is crucial to consider both simulation speed and CPU efficiency. If CPU efficiency is low, you will be wasting resources. This will negatively impact your priority, and as a result, you will not be able to run as many jobs as you would if you used CPUs more efficiently. To assess CPU efficiency, you need to know how fast a serial simulation runs and then compare the expected 100% efficient speedup (speed on 1CPU x N) with the actual speedup on N CPUs.
Here is the chart of the maximum simulation speed of all MD engines tested on the Alliance systems. These results may give you valuable insight into how fast and efficient you can expect your simulation to run with different packages/resources.
Submission scripts for running the benchmarks.
GROMACS
Extend simulation for 10000 steps
gmx convert-tpr -s topol.tpr -nsteps 10000 -o next.tpr
Submission script for a CPU simulation
#SBATCH --mem-per-cpu=4000M
#SBATCH --time=10:00:00
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=4
module load StdEnv/2020 gcc/9.3.0 openmpi/4.0.3 gromacs/2023.2
export OMP_NUM_THREADS="${SLURM_CPUS_PER_TASK:-1}"
srun gmx mdrun -s next.tpr -cpi state.cpt
Submission script for a single GPU simulation
#!/bin/bash
#SBATCH --mem-per-cpu=2000M
#SBATCH --time=1:00:00
#SBATCH --cpus-per-task=12
#SBATCH --gpus-per-node=1
module load StdEnv/2020 gcc/9.3.0 cuda/11.4 openmpi/4.0.3 gromacs/2023.2
gmx mdrun -ntomp ${SLURM_CPUS_PER_TASK:-1} \
-nb gpu -pme gpu -update gpu -bonded cpu -s topol.tpr
Submission script for a multiple GPU simulation
#!/bin/bash
#SBATCH --mem-per-cpu=2000M
#SBATCH --time=1:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=12
#SBATCH --gpus-per-task=1
module load StdEnv/2020 gcc/9.3.0 cuda/11.4 openmpi/4.0.3 gromacs/2023.2
srun gmx mdrun -ntomp ${SLURM_CPUS_PER_TASK:-1} \
-nb gpu -pme gpu -update gpu -bonded cpu -s topol.tpr
PMEMD
Submission script for a single GPU simulation
#!/bin/bash
#SBATCH --cpus-per-task=1
#SBATCH --gpus 1
#SBATCH --mem-per-cpu=2000M
#SBATCH --time=1:00:00
module --force purge
module load StdEnv/2020 gcc/9.3.0 cuda/11.4 openmpi/4.0.3 amber/20.12-20.15
pmemd.cuda -O -i pmemd.in -o production.log -p prmtop.parm7 -c restart.rst7
Submission script for a multiple GPU simulation
Multiple GPU pmemd version is meant to be used only for AMBER methods running multiple simulations, such as replica exchange. A single simulation does not scale beyond 1 GPU.
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --gpus-per-node=2
#SBATCH --mem-per-cpu=2000M
#SBATCH --time=1:00:00
module --force purge
module load StdEnv/2020 gcc/9.3.0 cuda/11.4 openmpi/4.0.3 amber/20.12-20.15
srun pmemd.cuda.MPI -O -i pmemd_prod.in -o production.log \
-p prmtop.parm7 -c restart.rst7
NAMD 3
Submission script for a GPU simulation
#!/bin/bash
#SBATCH --cpus-per-task=2
#SBATCH --gpus-per-node=a100:2
#SBATCH --mem-per-cpu=2000M
#SBATCH --time=1:00:00
NAMDHOME=$HOME/NAMD_3.0b3_Linux-x86_64-multicore-CUDA
$NAMDHOME/namd3 +p${SLURM_CPUS_PER_TASK} +idlepoll namd3_input.in
How to make your simulation run faster?
It is possible to increase time step to 4 fs with hydrogen mass re-partitioning. The idea is that hydrogen masses are increased and at the same time masses of the atoms to which these hydrogens are bonded are decreased to keep the total mass constant. Hydrogen masses can be automatically re-partitioned with the parmed program.
module --force purge
module load StdEnv/2020 gcc ambertools python scipy-stack
source $EBROOTAMBERTOOLS/amber.sh
parmed prmtop.parm7
ParmEd: a Parameter file Editor
Loaded Amber topology file prmtop.parm7
Reading input from STDIN...
> hmassrepartition
> outparm prmtop_hmass.parm7
> quit
References:
1.Lessons learned from comparing molecular dynamics engines on the SAMPL5 dataset
2.Delivering up to 9X the Throughput with NAMD v3 and NVIDIA A100 GPU
3.AMBER GPU Docs
4.Long-Time-Step Molecular Dynamics through Hydrogen Mass Repartitioning
Key Points
To assess CPU efficiency, you need to know how fast a serial simulation runs