The formats will sometimes be given with notation such as '%d' to indicate a decimal integer; '%6.3f' for a floating point number with up to 6 characters and 3 digits after the decimal place; or '%-7s' for a left-justified string 7 characters wide. This notation is compatible with C, C++, awk/nawk/gawk, and with a slight modification, Python.
The "" symbol is used to indicate one space.
The type of an argument is described using the following style:
= an alphanumeric string. In most cases, this is a valid filename;
= a single letter;
= a decimal integer;
= a decimal integer greater than zero;
= a decimal integer in the "long" range (depends on computer);
= a floating point or real number.
The name 'PDBQ' derives from 'PDB', the Protein DataBank, and 'Q', a common symbol for partial charge. As the name suggests, the PDBQ format is very similar to the PDB format for ATOM records, with a modification in columns 71-76 (counting the first column as 1, not 0) to carry the partial charge, as %6.3f. Thus, the format of the whole line is as follows:
"ATOM%5d%-4s%1s%-3s%1s%4d%1s%8.3f%8.3f%8.3f%6.2f%6.2f%4s%6.3f%8.2f%8.2f\n", atom_serial_num, atom_name, alt_loc, res_name, chain_id, res_num, ins_code, x, y, z, occupancy, temp_factor, footnote, partial_charge, atomic_fragmental_volume, atomic_solvation_parameter
The atomic fragmental volume and solvation parameters are derived from the method of Stouten et al . 1
The input file is often referred to as a "grid parameter file" or "GPF" for short. The scripts described in the appendices give these files the extension ".gpf". In the grid parameter file, the user must specify the following spatial attributes of the grid maps:
5. seven lines containing the non-bonded parameters for each pairwise-atomic interaction, in the following order: Y -C, Y -N, Y -O, Y -S, Y -H, Y - X , ( X is any other atom type) and Y - M ( M is a metal, say).
This latter method of specification is more intuitive for the user, while AutoGrid handles the calculation of the coefficients. By default, the Y - X and Y - M lines are copies of the Y -H line. But in some systems, such as receptors which consist of DNA/protein complexes, both sulphur and phosphorus can be present. In this scenario, the Y - X line can be used for modeling interactions with receptor-phosphorus atoms. A very rough approximation for phosphorus parameters is to borrow those of carbon.
The "elecmap" line in the grid parameter file is the filename of the electrostatic potential grid map. The following parameter, "dielectric", if negative, indicates that the distance-dependent dielectric function of Mehler and Solmajer 3 will be used. If positive, however, the value of that number will be used as a constant dielectric. For example, if the value were 40.0, then a constant dielectric of 40 would be used.
The grid field filename, which will be written in a format readable by AutoDock and AVS 2 . The filename extension must be `.fld'.
Number of x -, y - and z -grid points. Each must be an even integer number. When added to the central grid point, there will be an odd number of points in each dimension. The number of x -, y - and z -grid points need not be equal.
The user can explicitly define the center of the grid maps, respectively the x , y and z coordinates of the center of the grid maps (units: Å, Å, Å.) Or the keyword "auto" can be given, in which case AutoGrid will center the grid maps on the center of mass of the macromolecule.
This is always 0.5Å, when using the AutoDock 3.0 free energy function. It is used to smooth the pairwise atomic affinity potentials (both van der Waals and hydrogen bonds). See the Theory section for more details.
Either "nbp_coeffs" or "nbp_r_eps" keywords can be used to define Lennard-Jones or hydrogen bond interaction energy parameters. The keyword "nbp_coeffs" specifies coefficients and exponents, in the order "C n C m n m ", delimited by spaces; n and m are integer exponents. The units of C n and C m must be kcal mol -1 Å n and kcal mol -1 Å m respectively; n and m have no units.
Alternatively, the user can employ "nbp_r_eps" to specify the equilibrium distance and well depth, epsilon, for the atom pair. The equilibrium separation has units of Å and the well depth, epsilon, units of kcal mol -1 . The integer exponents n and m must be specified too.
In either case, the order of the parameters must be: Y -C, Y -N, Y -O, Y -S, Y -H, Y -X, and Y -M. Repeat 1 "map" line and the 7 "nbp_coeffs"or "nbp_r_eps" lines, for each atom type, Y , present in the ligand being docked.
This is added to all the values in a grid map, and is only set to a non-zero, positive number for hydrogen bonding maps. This value is essentially the penalty for un-formed hydrogen bonds in the complex.
(Optional.) Filename for the so-called "floating" grid map 3 ; filename extension `.map'. In such floating grids, the scalar at each grid point is the distance to the nearest atom in the receptor. These values could be used to guide the docking ligand towards the receptor's surface, thus avoiding non-interesting, empty regions.
The first six lines of each grid map hold header information which describe the spatial features of the maps and the files used or created. These headers are checked by AutoDock to ensure that they are appropriate for the requested docking. The remainder of the file contains grid point energies, written as floating point numbers, one per line. They are ordered according to the nested loops z( y( x ) ). A sample header from a grid map is shown below:
This is essentially two files in one. It is both an AVS field file, and and
input file with
-specific information `hidden' from AVS in the comments at the head of the file.
uses this file to check that all the maps it reads in are compatible with one-another and itself. For example, in this file, the grid spacing is 0.375 Angstroms, there are 60 intervals in each dimension, the grid is centered near (46,44,14), it was calculated around the macromolecule `
', and the
parameter file used to create this and the maps was `
'. This file also points to a second file, `
', which contains the minimum and maximum extents of the grid box in each dimension,
. Finally, it lists the grid map files that were calculated by
, here `
' and `
AutoDock 3.0 has an interface based on keywords. This is intended to make it easier for the user to set up and control a docking job, and for the programmer to add new commands and functionality. The input file is often referred to as a "docking parameter file" or "DPF" for short. The scripts described in the appendices give these files the extension ".dpf".
All delimiters where needed are white spaces. Default values, where applicable, are given in square brackets [thus]. A comment must be prefixed by the "
" symbol, and can be placed at the end of a parameter line, or on a line of its own.
There are two possible random number generator libraries. One is the system's own implementations, and the second is the platform-independent library from the University of Texas Biomedical School. If the user gives just one argument to "
", then AutoDock will use the system's implementation of the random number generator and corresponding system seed call. On most platforms, these are "drand48" and "srand48". The platform-independent library, however, requires two seed values. Giving two arguments to "seed" tells AutoDock to use the platform-independent library for random number generation.
The random-number generator (RNG) for each docking job can be `seeded' with either a user-defined, a time-dependent, or process-ID-dependent seed. These two seeds can be any combination of explicit long integers, the keyword "
" or the keyword "
". When two arguments to seed are given, the portable RNG is used; when one is given, the built-in RNG (usually the "drand48" C-function) is used. The portable RNG is required for the genetic algorithm and the Solis and Wets routines. The portable RNG cannot be used with the simulated annealing routine: this needs just one seed parameter. The keyword, "time" gives the number of seconds since the epoch. The epoch is referenced to 00:00:00 CUT (Coordinated Universal Time) 1 Jan 1970. The "pid" gives the UNIX process ID of the currently executing AutoDock process, which is reading this parameter file.
Atom names for all atom types present in ligand. Each must be a single character, and only one of: C, N, O, S, H, X, or M. The maximum number of characters allowed in this line is ATOM_MAPS, which is defined in the "autodock.h" include file. Do not use any spaces to delimit the types: they are not needed.
Filename for the first
affinity grid map of the 1st atom type. This keyword plus filename must be repeated for all atom types in the order specifed by the "
" command. In all map files a 6-line header is required, and energies must be ordered according to the nested loops z( y( x ) ).
Use this keyword to specify the center of the ligand,
which rotations will be made. (The coordinate frame of reference is that of the ligand PDBQ file.) Usually the rotation center of the ligand is the mean x,y,z-coordinates of the molecule. Inside
, the "
" xyz-coordinates are
from each atom's coordinates in the input PDBQ file. So internally, the ligand's coordinates become centered at the origin. Units: Å, Å, Å.
Initial coordinates for the center of the ligand, in the same frame of reference as the receptor's grid maps. The ligand, which has been internally centered using the "about" coordinates, has the xyz-coordinates of the initial translation "
tran0 x y z
on. Every run starts the ligand from this location.
The user must specify the absolute starting coordinates for the ligand, used to start each run. The user should ensure that the ligand, when translated to these coordinates, still fits within the volume of the grid maps. If there are some atoms which lie outside the grid volume, then AutoDock will automatically correct this, until the ligand is pulled completely within the volume of the grids. (This is necessary in order to obtain complete information about the energy of the initial state of the system.) The user will be notified of any such changes to the initial translation by AutoDock. (Units: Å, Å, Å.)
[1, 0, 0, 0°]
Respectively: Qx, Qy, Qz , Q w. Initial quaternion (applied to ligand) - Qx, Qy, Qz define the unit vector of the direction of rigid body rotation, and Q w defines the angle of rotation about this unit vector, in ° . (Units: none,none,none, °.)
Alternatively, the user can just give the keyword "random" and AutoDock will pick a random unit vector and a random rotation (between 0° and 360°) about this unit vector. Each run will begin at this same random rigid body rotation.
Number of dihedrals or rotatable bonds in the ligand. This may be specifed only if rotatable bonds have been defined using ROOT, BRANCH, TORS
. keywords in the PDBQ
" line. The number supplied to this command must agree with the number of torsions defined in this ligand PDBQ file. If this keyword is used, then the next keyword,
, must also be specified. Note that if
are not specified and there are defined torsions in the ligand PDBQ file,
assumes that the
are all zero, and does not change the initial ligand torsion angles. (See also "torsdof" below).
dihedral angles; there must be
floating point numbers specified on this line. Each value specified here will be added to the corresponding torsion angle in the input PDBQ file, at the start of each run. Torsion angles are only specified by two atoms, so the definition of rotations is relative. Units: °.
The first form, with one argument, defines the maximum translation jump for the first cycle that the ligand may make in one simulated annealing step. When "trnrf" is less than 1, the reduction factor is multiplied with the tstep at the end of each cycle, to give the new value for the next cycle. The second form allows the user to specify the value for the first cycle and the last cycle: AutoDock then calculates the reduction factor that satisfies these constraints. Units: Å.
(Optional) This defines the energy-barrier height applied to constrained torsions. When the torsion is at a preferred angle, there is no torsion penalty: this torsion's energy is zero. If the torsion angle falls within a disallowed zone, however, it can contribute up to the full barrier energy. Since the torsion-energy profiles are stored internally as arrays of type `unsigned short', only positive integers between 0 and 65535 are allowed.
(Optional) Adds a constraint to a torsion. The torsion number is identified by an integer. This identifier comes from the list at the top of the AutoTors-generated input ligand PDBQ file (on the REMARK lines). An energy profile will be calculated for this torsion. An inverted Gaussian bell curve is added for each new constraint. To completely specify each Gaussian, two floating point numbers are needed: the preferred angle and the half-width respectively (both in degrees). Note that the preferred angle should be specified in the range -180 ° to +180 ° ; numbers outside this range will be wrapped back into this range. This angle, c, is relative to the original torsion angle in the input structure. The half-width is the difference between the two angles at which the energy is half the barrier (B/2 in the diagram above). The smaller the half-width, the tighter the constraint.
If you wish to constrain to absolute-valued torsion angles, it will be necessary to zero the initial torsion angles in the ligand, before input to AutoTors . The problem arises from the ambiguous 2-atom definition of the rotatable bond B-C . To identify a torsion angle unambiguously, 4 atoms must be specified: A-B-C-D :
The sign convention for torsion angles which we use is anti-clockwise (counter-clockwise) are positive angles, clockwise negative. In the above diagram, looking down the bond B-C , the dihedral angle A-B-C-D would be positive.
There is no limit to the number of constraints that can be added to a given torsion. Each new torsion-constraint energy profile is combined with the pre-existing one by selecting the minimum energy of either the new or the existing profiles.
(Optional) (Use only with "gausstorcon") This switches on the storage and subsequent output of torsion energies. During each energy evaluation, the penalty energy for each constrained torsion, as specified by the "gausstorcon" command, will be stored in an array. At the end of each run, the final docked conformation's state variables are output, but with this command, the penalty energy for each torsion will be printed alongside its torsion angle.
(Optional) This command also adds a torsion constraint to the <integer>-th torsion, as numbered in the AutoTors-generated REMARKs. The first float defines the preferred relative angle , and the second specifies the full width of the allowed range of torsion angles (both in degrees). This type of torsion constraint is "hard" because the torsion is never allowed to take values beyond the range defined. For example, " hardtorcon 3 60. 10." would constrain the third torsion to values between 55° and 65°.
This specifies respectively the number and the coefficient of the torsional degrees of freedom (DOF) for the estimation of the change in free energy upon binding, D G binding . For the purposes of AutoDock 3.0, the number of torsional DOF is the number of rotatable bonds in the ligand, excluding any torsions that rotate one or more hydrogen atoms, e.g. hydroxyls, amines, methyls. By default, the coefficient is 0.3113 kcal mol -1 , although the user can override this as necessary. (Units: none; kcal mol -1 ).
These parameters are needed even if no rotatable bonds were defined in the ligand-PDBQ file. They are only used in the internal energy calculations for the ligand and must be consistent with those used in calculating the grid maps. (Units: kcal mol -1 Å n ; kcal mol -1 Å m ; none; none, respectively).
Respectively: reqm; e; n; m, This command is an alternative way of specifying the internal pairwise non-bonded energy parameters for flexible ligands, where AutoDock calculates the pairwise atomic potential using:
The first two arguments specify the equilibrium distance and well depth, epsilon, for the atom pair. The equilibrium separation has units of Å and the well depth, epsilon, units of kcal mol -1 . The integer exponents n and m must be specified too. Obviously, n m . (Units: Å; kcal mol -1 ; none; none, respectively).
(Optional) Internal ligand electrostatic energies will be calculated; the products of the partial charges in each non-bonded atom pair are pre-calculated, and output. Note that this is only relevant for flexible ligands.
[500. cal mol
Initial "annealing temperature"; this is actually the absolute temperature multiplied by the gas constant R . R = 8.314 J mol -1 K -1 = 1.987 cal mol -1 K -1 . (Units: cal mol -1 .)
Annealing temperature reduction factor, g [0.95 cycle -1 ]. See the equation at the bottom of page 5. At the end of each cycle, the annealing temperature is multiplied by this factor, to give that of the next cycle. This must be positive but < 1 in order to cool the system. Gradual cooling is recommended, so as to avoid " simulated quenching ", which tends to trap systems into local minima.
These keywords are all synonymous, and instruct
to use a linear or
temperature reduction schedule during
simulated annealing. Unless this keyword is given, a
reduction schedule is used, according to the
parameter just described. If the linear schedule is requested, then any
parameters will be ignored. The first simulated annealing cycle is carried out at the annealing temperature
. At the end of each cycle, the temperature is reduced by (
). The advantage of the linear schedule is that the system samples evenly across the temperature axis, which is vital in entropic calculations. Geometric temperature reduction schedules on the other hand, under-sample high temperatures and over-sample low temperatures.
Diagnostic output level. For SA (simulated annealing): 0 = no output, 1 = minimal output, 2 = full state output at end of each cycle; 3 = detailed output for each step. For GA and GA-LS (genetic algorithm-local search): 0 = minimal output, 1 = write minimum, mean, and maximum of each state variable at the end of every generation. Use "outlev 1" for SA, and "outlev 0" for GA and GA-LS. If you use "outlev 1" with GA-LS, you will generate very large log files.
Output frequency, n , for trajectory of ligand, in steps. If n = 0, then no trajectory states will be output; otherwise, every n th state will be output. The state consists of 7 floats describing the x,y,z translation, the x,y,z components of the quaternion unit vector, the angle of rotation about the quaternion axis; and any remaining floats describing the torsions, in the same order as described in the input ligand PDBQ file).
Trajectory filename. AutoDock will write out state variables to this file every "trjfrq" steps. Use the "traj" command in AutoDock's command mode to convert this trajectory of state-variables into a series of PDB frames. The "traj" command is described in § "Using the Command Mode in AutoDock"; see also § "Trajectory Files".
( Optional) Creates a "watch" file for real-time monitoring of an in-progress simulated annealing job. This works only if the "trjfrq" parameter is greater than zero. The watch file will be in PDB format, so give a ".pdb" extension. This file has an exclusive lock placed on it, while AutoDock is writing to it. Once the file is closed, the file is unlocked. This can signal to a watching visualization program that the file is complete and can now be read in, for updating the displayed coordinates. This file is written at exactly the same time as the trajectory file is updated
This is only used by the simulated annealing method. This keyword stipulates that the ligand's initial state cannot have an energy greater than the first value, nor can there be more than the second value's number of retries. Typical energy values range from 0 to 1000 kcal/mol. If the initial energy exceeds this value, a new random state is generated and tested. This process is iterated until the condition is satisfied. This can be particularly useful in preventing runs starting in exceptionally high energy regions. In such cases, the ligand can get trapped because it is unable to take a long enough translational jump. In those grids were the ligand is small enough to fit into the low energy regions with ease, there will not be many iterations before a favorable location is found. But in highly constrained grids, with large ligands, this initialization loop may run almost indefinitely.
The root mean square deviation (rmsd) of the docked conformations will be calculated with respect to the coordinates in the PDB or PDBQ file specified here. This is useful when the experimentally determined complex conformation of the ligand is known. The order of the atoms in this file must match that in the input PDBQ file given by the
command. These values of rmsd will be output in the last column of the final PDBQ records, after the clustering has been performed.
rms deviation tolerance for cluster analysis or `structure binning' , carried out after multiple docking runs. If two conformations have an rms less than this tolerance, they will be placed in the same cluster. The structures are ranked by energy, as are the clusters. The lowest energy representative from each cluster is output in PDBQ format to the log file. To keep the ligand's residue number in the input PDBQ file, use the `
' flag; otherwise the clustered conformations are numbered incrementally from 1. (Units: Å).
When more than one run is carried out in a given job, cluster analysis or `structure binning' will be performed, based on structural rms difference, ranking the resulting families of docked conformations in order of increasing energy. The default method for structure binning allows for atom similarity, as in a tertiary-butyl which can be rotated by +/-120°, but in other cases it may be desirable to bypass this similar atom type checking and calculate the rms on a one-for-one basis. The symmetry checking algorithm scans all atoms in the reference structure, and selects the nearest atom of identical atom type to be added to the sum of squares of distances. This works well when the two conformations are very similar, but this assumption breaks down when the two conformations are translated significantly. Symmetry checking can be turned off using the
command; omit this command if you still want symmetry checking.
(Clustering multi-job output only.)
will go into `cluster mode'. Use this command only to perform cluster analysis on the combined output,
, of several jobs. This command can be very useful when many jobs have been distributed to several machines and run in `parallel'. The docking parameter file will need the following keywords:
; and optionally write_all_cluster_members and/or
It is necessary to
the USER lines along with the ATOM records, since
parses the these lines to determine what the energy of that particular conformation was. For more information, see the example DPF files given later.
(Clustering multi-job output only.) This command is used only with the cluster command, to write out all members of each cluster instead of just the lowest energy from each cluster. This affects the cluster analysis PDBQ output at the end of each job.
This is the number of individuals in the population. Each individual is a coupling of a genotype and its associated phenotype. Usually, this number is fixed throughout the run. Typical values range from 50 to 200.
This is a floating point number from 0 to 1 denoting the crossover rate. Crossover rate is the expected number of pairs in the population that will exchange genetic material. Setting this value to 0 turns the GA into the evolutionary programming (EP) method, but EP would probably require a concomitant increase in the ga_mutation_rate in order to be effective.
These are floating point parameters used in the mutation of real number genes. They correspond to the alpha and beta parameters in a Cauchy distribution. Alpha roughly corresponds to the mean, and beta to something like the variance of the distribution. It should be noted, though, that the Cauchy distribution doesn't have finite variance. For the mutation of a real valued gene, a Cauchy deviate is generated and then added to the original value.
This command sets the global optimizer to be a genetic algorithm [GA]. This is required to perform a GA search. This passes any '
' parameters specified
this line to the global optimizer object. If this command is omitted, or it is given before the '
' parameters, your choices will not take effect, and the default values for the optimizer will be used.
To use the
Lamarckian genetic algorithm
also specify the parameters for local search, and then issue either the '
' or '
' command. The former command uses the strict Solis and Wets local search algorithm, while the latter uses the pseudo-Solis and Wets algorithm: see earlier for details about how they differ.
This is the maximum number of iterations that the local search procedure apply to the phenotype of any given individual. This is an unsigned integer. In Bill's experiments, he used a combination of iterations and function evaluations. It seems to me, that a value around 30 should be fine.
Both of these commands, '
' and '
', pass any '
' parameters set before this line to the local searcher. If you forget to use this command, or give it before the '
' keywords, your choices will not take effect, and the default values for the optimizer will be used.
Instructs AutoDock to use the pseudo-Solis and Wets local searcher. This method maintains the relative proportions of variances for the translations in Å and the rotations in radians. These are typically 0.2 Å and 0.087 radians to start with, so the variance for translations will always be about 2.3 times larger than that for the rotations ( i.e. orientation and torsions).
This command instructs AutoDock to do the specifed number of docking runs using the simulated annealing (SA) search engine. This uses the value set by the "
" keyword as the number of SA docking runs to carry out. All relevant parameters for the simulated annealing job must be set first. These are indicated above by
in each keyword description.
This keyword instructs AutoDock to carry out only the local search of a global-local search; the genetic algorithm parameters are ignored, with the exception of the population size. This is an ideal way of carrying out a minimization using the same force field as is used during the dockings. The "
" keyword should not be given. The number after the keyword determines how many dockings will be performed.
This keyword instructs AutoDock to carry out dockings using only a global search, i.e. the traditional genetic algorithm. The local search parameters are ignored. The "
" keyword should not be given. The number after the keyword determines how many dockings will be performed.
This command invokes the new hybrid, Lamarckian genetic algorithm search engine, and performs the requested number of dockings. All appropriate parameters must be set first: these are listed above by "
This performs a cluster analysis on results of a docking, and outputs the results to the log file. The docked conformations are sorted in order of increasing energy, then compared by root mean square deviation. If the conformer is within the "rmstol" threshold, it is placed into the same cluster. A histogram is printed showing the number in each cluster, and if more than one member, the cluster's mean energy. Furthermore, a table is printed to the docking log file of cluster rmsd and reference rmsd values.