Assigning Protonation States to Residues in a Protein

Overview

Teaching: 30 min
Exercises: 10 min
Questions
  • Why titratable aminoacid residues can have different protonation states?

  • How to determine protonation state of a residue in a protein?

  • What are the weaknesses of fixed protonation state simulations?

Objectives
  • Understand why it is necessary to assign the correct protonation state

  • Learn how to determine protonation state of a protein

  • Learn how to assign protonation state to a residue

It is important to consider amino acid protonation states

Setting up a simulation system requires assigning protonation states and, possibly, tautomers to the HIS residues. The protonation states of titratable amino acids (Arg, Lys, Tyr, Cys, His, Glu, Asp) depend on the local micro-environment and pH. A highly polar microenvironment will stabilize the charged form, while a less polar microenvironment will favor the neutral form. At physiological pH, TYR, LYS, CYS, and ARG are almost always in their standard protonation states, while GLU, ASP, and HIS can be in non-standard forms.

The protonation pattern of proteins is crucial for their catalytic function and structural stability. For a classic MD simulation, protonation states must be determined and assigned before the simulation can begin. For realistic MD simulations, it is essential to assign the right protonation states. The wrong protonation states can have dramatic effects and invalidate the results. Numerous MD simulation studies have demonstrated the importance of protein protonation states [1, 2, 3, 4, 5].

How to Determine Protonation States of Residues in a Protein?

For predicting the pKa values of protein residues, several web servers and standalone programs are available. The underlying methods used by all of them fall into two broad categories: empirical (such as propka) and Poisson-Boltzmann methods. The disadvantage of the empirical method is that it works well only for proteins similar to those in the training set. It is not guaranteed to work well for all proteins. Programs implementing Poisson-Boltzmann continuum electrostatics have a solid physical basis. Most of them, however, only sample a single protein conformation, which is the primary source of inaccuracy. One exception is MCCE, which samples sidechain rotamers, giving a more precise picture of coupled ionization and position changes. MCCE also has limitations because it does not sample backbone conformations. The most rigorous method is constant pH simulations in which the ionization states are dynamically adjusted in MD simulations. It is a computationally expensive method, but a very efficient GPU implementation of constant pH molecular dynamics has recently been developed and implemented in AMBER.

For predicting the pKa values of protein residues, several web servers and standalone programs are available.

There is no perfect pKa prediction method. Deviations from experimental values can sometimes be significant. The best practice is to compare the results obtained from different techniques and, if possible, to use experimentally measured values.

Calculating pKa’s

  1. Calculate pKa’s of residues in the PDB entry 1RGG using H++ server.
  2. What protonation states of Asp79 and His53 are appropriate for simulation at pH 6?
  3. Repeat calculations using PDB2PQR server and compare the results.
  4. Compare calculated pKa’s with the experimental. How accurate are the predicted pKa values?

What protonation states are appropriate for simulating Asp79 and His53 at pH 6?

Solution

If pKa > pH the probability that the residue is protonated is > 50%, and we use the protonated form.
If pKa < pH the probability that the residue is protonated is < 50% and we use the deprotonated form.

ASP79 has pKa 7.2 (experimental 7.37), it is protonated at pH 6 and we rename it to ASH
HIS53 has pKa 8.3 (experimental 8.27), it is also protonated at pH 6 and we rename it to HIP

How to select protonation state of a residue?

Assigning protonation states in structure files

A more consistent and convenient way to select the desired form of amino acid is to change its name in the structure file before generating a topology. The neutral states of LYS, ASP, and GLU can be chosen by renaming them LYN, ASH, and GLH, respectively. You can select the appropriate form of HIS by renaming HIS to HIE (proton on NE2), HID (proton on ND1), or HIP (both protons).

LYS (+1) - LYN  (0)  
ASP (-1) - ASH  (0)  
GLU (-1) - GLH  (0)  
HIS  (0) - HIE  (0)  
HIS  (0) - HID  (0)  
HIS  (0) - HIP (+1)    

Selecting protonation states with check_structure.

Let’s change ASP20 and ASP26 in the file 1ert.pdb to the neutral form ASH.

cd ~/scratch/workshop_amber/example_02
check_structure -i 1ert.pdb -o 1ert_protonated.pdb \
command_list --list "\
add_hydrogen --list A:asp20ash,A:asp26ash --add_mode list;\
water --remove yes"

You can verify that residues are changed by grepping ASH.

Selecting protonation states with VMD.

Change ASP20 and ASP26 in the file 1ert.pdb to the neutral form ASH and remove water.

Solution

ml StdEnv/2023 vmd
vmd
mol new 1ert.pdb
set s [atomselect top "resid 20 26"]
$s set resname ASH
set s [atomselect top "protein"]
$s writepdb 1ert_protonated.pdb
quit

Assigning protonation states with the GROMACS pdb2gmx module

Protonation states can be assigned using the GROMACS pdb2gmx program. By default, pdb2gmx will select charged forms of LYS, ASP, or GLU. For HIS, it will try to place the proton on either ND1 or NE2 based on an optimal hydrogen bonding conformation. You can override the default behavior and select protonation manually using options -lys, -asp, -glu, -his.

Downsides:

The downside of this method is that it can not be scripted. The manual selection is cumbersome because pdb2gmx will be prompting to select the protonation state for each of the residues in a PDB file. Besides, pdb2gmx changes residue names only in the output topology file. Residue names are left unchanged in the output .pdb and .gro files.

Limitations of Fixed Protonation State Simulations

The use of constant protonation states in molecular dynamics simulations has its disadvantages. When conformational changes occur in natural systems, protonation patterns often change as well. In molecular dynamics simulations with fixed states, these processes are not coupled, making it difficult to fully understand proton-coupled conformational dynamics. Consider using constant pH simulations if proton-coupled dynamics are essential to your research. MD with constant pH is currently implemented in both AMBER and NAMD. At the moment, it is not officially implemented in GROMACS, but a modified version is available [6].

Combining all structure preparation steps in one check_structure script

cd ~/scratch/workshop_amber/example_03
check_structure -i 1rgg.pdb -o 1RGG_chain_A_prot.pdb \
command_list --list "\
chains --select A;\
add_hydrogen --list A:his53hip,A:asp79ash --add_mode list;\
altloc --select A5:B,A54:B,A6:A,A13:A,A42:A,A85:A,A91:A;\
getss --mark all;\
ligands --remove all;\
water --remove yes"

You can also save commands in a file and pass it as an argument to “command_list –list”:

check_structure -i 1rgg.pdb -o 1RGG_chain_A_prot.pdb command_list --list prep_1rgg.chk

References

  1. Constant-pH Molecular Dynamics Simulations for Large Biomolecular Systems

  2. GPU-Accelerated Implementation of Continuous Constant pH Molecular Dynamics in Amber: pKa Predictions with Single-pH Simulations

Combining all structure preparation steps in one VMD script

Combine all previous steps together and create VMD script to prepare MD simulation system for the hydrolaze PDB structure 1RGG. The script should perform the following steps:

  1. Select molecule A
  2. Remove non-protein molecules
  3. Select location ‘B’ for residues 5, 54 and location ‘A’ for all other residues with alternative locations
  4. Protonate Asp79 and His53
  5. Rename CYS 7 and 96 into CYX (cross-linked cystein)
  6. Save the resulting structure as 1RGG_chain_A_prot.pdb

Solution

cd ~/scratch/workshop_amber/example_03

Save the following commands in a file, e.g. prep_1RGG.vmd

# Load 1rgg.pdb into a new (top) molecule
mol pdbload 1rgg
# Select and save all chain A protein atoms
set s [atomselect top "protein and chain A"]
$s writepdb 1RGG_chain_A.pdb
# Delete the top molecule
mol delete top
# Load chain A into a new molecule 
# Loading only one chain will simplify selections commands 
mol new 1RGG_chain_A.pdb
# Protonate ASP79
set s [atomselect top "resid 79"]
$s set resname ASH
# Protonate HIS53
set s [atomselect top "resid 53"]
$s set resname HIP
# Rename cross-linked cysteins
set s [atomselect top "resid 7 96"]
$s set resname CYX
# Select the base and the alternate locations
set s [atomselect top "(altloc '') or (altloc A and resid 6 13 42 85 91) or (altloc B and resid 5 54)"]
# Save the selection
$s writepdb 1RGG_chain_A_prot.pdb
quit

Execute the script

vmd -e prep_1RGG.vmd

Key Points

  • Assigning correct protonation states of aminoacids in proteins is crucial for realistic MD simulations

  • Conformational changes in proteins may be accompanied by changes in protonation pattern.