Protein Data Bank (PDB) is a huge database containing information about the 3D structures of large biological molecules, mainly proteins and nucleic acids. These proteins and nucleic acids could be of any organisms. For instance, bacteria, fungi, archaea, plants, animals and humans.
The database is free to use for anyone. Information about the structure could help biologists, chemists or any researcher to further understand the functions of these molecules. This knowledge could be applied in other fields including medicine, veterinar science, drug development and many more.
The database was originally established in 1971 at Brookhaven National Laboratory, containing only seven structures. In 1998, the Research Collaboratory for Structural Bioinformatics (RCSB) has become the body responsible for the management of the PDB.
The data are obtained from submission of biologist to the RSCB. After validation which would normally take two weeks, the structure would be included in the existing database if it is deemed to be correct. The database is updated each week at the target time of Wednesday 00:00 UTC (Coordinated Universal Time. The number of structure available in PDB has been growing exponentially in recent years. The table below show how the number has been changing throughout the years :
(source: Wikipedia)
Year | Number of searchable structure |
---|---|
1976 | 13 |
1981 | 85 |
1986 | 213 |
1991 | 694 |
1996 | 4 988 |
2001 | 16 428 |
2006 | 40 608 |
2011 | 78 111 |
To illustrate how useful the PDB is, I would mention briefly of six proteins (which are all hydrolase) in this post. All data are obtained from the PDB on the internet.
*RasMol is a software which can be used to view the molecules structure files that are downloaded from PDB
1. Subtilisin
Subtilisin BPN' viewed in RasMol |
Assumed biological molecule of subtilisin BPN' |
Name of protein : Subtilisin BPN'
Havenbrook code : 1SBT
Organism: Bacillus amyloliquefaciens (bacteria)
Gene name: apr
Number of chain : 2
2. Lon A
ATP-dependent protease lon as viewed in RasMol |
Assumed biological molecule of ATP-dependent protease lon |
Name of protein : ATP-dependent protease lon
Havenbrook code : 3KIJ
Organism: Thermococcus onnurineus NA1 (archaea)
Gene name: TON_0529
Number of chain : 6
3. Dipeptidase
Renal dipeptidase as viewed in RasMol |
Assumed biological molecule of renal dipeptidase |
Name of protein : Renal dipeptidase
Havenbrook code : 1ITQ
Organism: Homo sapiens (eukarya)
Gene name: DPEP1 MDP RDP
Number of chain : 6
4. Carboxypeptidase
Carboxypeptidase GP180 Residues 503-882 as viewed in RasMol |
Assumed biological molecule of carboxypeptidase GP180 residues 503-882 |
Name of protein : Carboxypeptidase GP180 Residues 503-882
Havenbrook code : 1QMU
Organism: Lophonetta specularioides (eukarya)
Gene name: CPD
Number of chain : 2
5. DegP
Protease do as viewed in RasMol |
Assumed biological molecule of protease do |
Name of protein : Protease do
Havenbrook code : 3CSO
Organism: Escherichia coli (bacteria)
Gene name: degP htrA ptd b0161 JW0157
Number of chain : 2
There you have it. I hope I do manage to illustrate (though limitedly, since I am a CTS student) what type of data could we obtain from the PDB. If you go to the database itself, I bet you would discover more on proteins and DNAs. The link is provided here:
If you do visit the PDB, you could see how extensive are the informations about proteins and DNAs that they require such a huge database to store all of them. Imagine if we human want to have a database for all facts there are, I think it would be a challenge. But for Allah, He could do that. He know about everything, and He himself is sufficient to do so, because He is great like that. So I think it is kind of fitting to leave this song to end the post. It is Maher Zain's "Open Your Eyes", just to remind us how wonderful are all of Allah's creation.