Tuesday 8 October 2013

PDB: Protein Data Bank

    Protein Data Bank (PDB) is a huge database containing information about the 3D structures of large biological molecules, mainly proteins and nucleic acids. These proteins and nucleic acids could be of any organisms. For instance, bacteria, fungi, archaea,  plants, animals and humans.

    The database is free to use for anyone. Information about the structure could help biologists, chemists or any researcher to further understand the functions of these molecules. This knowledge could be applied in other fields including medicine, veterinar science, drug development and many more.

    The database was originally established in 1971 at Brookhaven National Laboratory, containing only seven structures. In 1998, the Research Collaboratory for Structural Bioinformatics (RCSB) has become the body responsible for the management of the PDB.

    The data are obtained from submission of biologist to the RSCB. After validation which would normally take two weeks, the structure would be included in the existing database if it is deemed to be correct. The database is updated each week at the target time of Wednesday 00:00 UTC (Coordinated Universal Time. The number of structure available in PDB has been growing exponentially in recent years. The table below show how the number has been changing throughout the years :

     Year             Number of searchable structure     
1976 13
1981 85
1986 213
1991 694
1996 4 988
2001 16 428
2006 40 608
2011 78 111
(source: Wikipedia)
    To illustrate how useful the PDB is, I would mention briefly of six proteins (which are all hydrolase) in this post. All data are obtained from the PDB on the internet. 

*RasMol is a software which can be used to view the molecules structure files that are downloaded from PDB

1. Subtilisin

Subtilisin BPN' viewed in RasMol
Assumed biological molecule of subtilisin BPN'
Name of protein : Subtilisin BPN'
Havenbrook code : 1SBT
Organism: Bacillus amyloliquefaciens (bacteria)
Gene name: apr
Number of chain : 2

2. Lon A

ATP-dependent protease lon as viewed in RasMol
Assumed biological molecule of ATP-dependent protease lon
Name of protein : ATP-dependent protease lon
Havenbrook code : 3KIJ
Organism: Thermococcus onnurineus NA1 (archaea)
Gene name: TON_0529
Number of chain : 6

3. Dipeptidase

Renal dipeptidase as viewed in RasMol
Assumed biological molecule of renal dipeptidase
Name of protein : Renal dipeptidase
Havenbrook code : 1ITQ
Organism: Homo sapiens (eukarya)
Gene name: DPEP1 MDP RDP
Number of chain : 6

4. Carboxypeptidase

Carboxypeptidase GP180 Residues 503-882 as viewed in RasMol
Assumed biological molecule of carboxypeptidase GP180 residues 503-882

Name of protein : Carboxypeptidase GP180 Residues 503-882
Havenbrook code : 1QMU
Organism: Lophonetta specularioides (eukarya)
Gene name: CPD
Number of chain : 2

5. DegP

Protease do as viewed in RasMol
Assumed biological molecule of protease do
Name of protein : Protease do
Havenbrook code : 3CSO
Organism: Escherichia coli (bacteria)
Gene name: degP htrA ptd b0161 JW0157
Number of chain : 2

    There you have it. I hope I do manage to illustrate (though limitedly, since I am a CTS student) what type of data could we obtain from the PDB. If you go to the database itself, I bet you would discover more on proteins and DNAs. The link is provided here:


   If you do visit the PDB, you could see how extensive are the informations about proteins and DNAs that they require such a huge database to store all of them. Imagine if we human want to have a database for all facts there are, I think it would be a challenge. But for Allah, He could do that. He know about everything, and He himself is sufficient to do so, because He is great like that. So I think it is kind of fitting to leave this song to end the post. It is Maher Zain's "Open Your Eyes", just to remind us how wonderful are all of Allah's creation.