The
Protein Data Bank (
PDB) is a
repository for the 3-D structural data of large biological
molecules, such as
proteins and
nucleic acids. (See also
crystallographic database). The
data, typically obtained by
X-ray
crystallography or
NMR spectroscopy
and submitted by
biologists and
biochemists from around the world, can be
accessed at no charge on the internet. The PDB is overseen by an
organization called the
Worldwide Protein Data Bank,
wwPDB.
The PDB is a key resource in areas of
structural biology, such as
structural genomics.
Most major scientific
journals, and some funding agencies, such as the NIH
in the
USA
, now require
scientists to submit their structure data to the PDB. If the
contents of the PDB are thought of as primary data, then there are
hundreds of derived (i.e., secondary) databases that categorize the
data differently. For example, both
SCOP and
CATH categorize structures according to type of
structure and assumed evolutionary relations;
GO categorize structures based on genes.
History
The PDB originated as a grassroots effort.
In 1971, Walter
Hamilton of the Brookhaven National Laboratory
agreed to set up the data bank at
Brookhaven. Upon Hamilton's death in 1973, Tom Koeztle took
over direction of the PDB. In January 1994,
Joel Sussman was appointed head of the PDB. In
October 1998,the PDB was transferred to the
Research Collaboratory for
Structural Bioinformatics (RCSB); the transfer was completed in
June 1999. The new director was
Helen
M. Berman of
Rutgers
University
(one of the member institutions of the
RCSB). In 2003, with the formation of the wwPDB, the PDB
became an international organization. Each of the four members of
wwPDB can act as
deposition, data processing and distribution centers for PDB data.
The data processing refers to the fact that wwPDB staff review and
annotates each submitted entry. The data are then automatically
checked for plausibility. (The
source code for this validation software has been made
available to the public at no charge.)
Contents
The PDB database is updated weekly (on Tuesday). Likewise, the
PDB Holdings List is also updated weekly. , the
breakdown of current holdings was as follows:
- :42,085 structures in the PDB have a structure factor file.
- :5,401 structures have an NMR restraint file.
These data show that most structures are determined by X-ray
diffraction, but about 15% of structures are now determined by
protein NMR, and a few are even
determined by
cryo-electron
microscopy. (Clicking on the numbers in the original table will
bring up examples of structures determined by that method.)
The significance of the structure factor files, mentioned above, is
that, for PDB structures determined by X-ray diffraction that have
a structure file, the electron density map may be viewed. The data
of such structures is stored on the "
electron
density server", where the electron maps can be viewed.
In the past, the
number of structures in the PDB has grown
nearly exponentially. In 2007, 7263 structures were added. However,
in 2008, only 7073 structures were added, so the rate of production
of structures has appeared to start to decrease.
File format
The file format initially used by the PDB was called the
PDB file format. This
original format was restricted by the width of computer punch cards
to 80 characters per line. Around 1996, the "macromolecular
Crystallographic Information file" format, mmCIF, started to be
phased in. An XML version of this format, called PDBML, was
described in 2005.The structure files can be downloaded in any of
these three formats. In fact, individual files are easily
downloaded into graphics packages using web addresses:
- For PDB format files, use, e.g.,
http://www.pdb.org/pdb/files/4hhb.pdb.gz * For PDBML
(XML) files, use, e.g.,
http://www.pdb.org/pdb/files/4hhb.xml.gz
The "
4hhb" is the PDB identifier. Each structure
published in PDB receives a four-character alphanumeric identifier,
its PDB ID. (This cannot be used as an identifier for biomolecules,
because often several structures for the same molecule—in different
environments or conformations—are contained in PDB with different
PDB IDs.)
Viewing the data
The structure files may be viewed using one of
several
open source computer programs. Some other free, but not open
source programs include
VMD,
MDL
Chime,
Swiss-PDB Viewer,
StarBiochem
(a Java-based interactive molecular viewer with integrated search
of protein databank), and
Sirius. The RCSB PDB website
contains an
extensive list of both free and commercial
molecule visualization programs and web browser plugins.
References
See also
External links