Friday, 8 November 2013

First Lesson about SMILES

Previous class of computer science was our last class with Madam Noraslinda in this semester. As students, time seems run in a rush. It felt we just had our first class two weeks ago, but then it is already in mid semester and getting closer to the final. During her last class, one of the lesson which we have learned was about how to do SMILES.

1. Definition of SMILES
SMILE stands for Simplified Molecular Input Line Entry Specification. It is a line notation for the chemical structure of molecules. It uses a short series ASCII strings to represent structures. Most molecule editor computer programs can draw a two-dimensional diagram or a three-dimensional model of a molecule based on its SMILES code.

2. Graph-based definition
 In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree traversal of a chemical graph. The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a spanning tree. Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes
Parentheses are used to indicate points of branching on the tree.

3. SMILES BONDS
There are four types of bonds, SMILES has symbols for each of them which are 
Singel*  -
Double  = 
Triple    #
Aromatic* :

* can be omitted.


4.Atoms,
Atoms are represented by the standard abbreviation of the chemical elements, in square brackets, such as [Au] for gold. Brackets can be omitted for the "organic subset" of B, C, N, O, P, S, F, Cl, Br, and I. All other elements must be enclosed in brackets. If the brackets are omitted, the proper number of implicit hydrogen atoms is assumed; for instance the SMILES for water is simply O.
An atom holding one or more electrical charges is enclosed in brackets, followed by the symbol H if it is bonded to one or more atoms of hydrogen, followed by the number of hydrogen atoms (as usual one is omitted example: NH4 for ammonium), then by the sign '+' for a positive charge or by '-' for a negative charge. The number of charges is specified after the sign (except if there is one only); however, it is also possible write the sign as many times as the ion has charges: instead of "Ti+4", one can also write "Ti++++" (Titanium IV, Ti4+). Thus, the hydroxide anion is represented by [OH-], the oxonium cation is [OH3+] and the cobalt III cation (Co3+) is either [Co+3] or [Co+++].

5. Branches are described with parentheses, as in CCC(=O)O for propionic acid and C(F)(F)F for. Substituted rings can be written with the branching point in the ring as illustrated by the SMILES COc(c1)cccc1C#N and COc(cc1)ccc1C#N which encode the 3 and 4-cyanoanisole isomers. Writing SMILES for substituted rings in this way can make them more human-readable.

 6. Understanding SMILES Notation
To make a correct name of a linear chemical structure
  •  Firstly, we need to identify the structure, including the bonds in the structure, atoms and charges.
  • Secondly, we determine the main chain of the structure, as we have to choose the longest chain one.
  • Thirdly, identifying the branches
  • Lastly, write down the notations regarding the chain that you have identified before. For linear structure, we are using capital C. 

For Cyclic Structures
  • Numbers indicate start and stop of ring
    Break one single or one aromatic bond in each ring
  • Number in any order. Designate ring-breaking atoms by the same digit following the atomic symbol
  • Same number indicates start and end of the ring, entered immediately following the start/end atoms
  • Only numbers 1 –9 are used
  • A number should appear only twice
  • Atom can be associated w. 2 consecutive numbers, e.g., Napthalene: c12ccccc1cccc2


What is the SMILES code for:
a) Nicotine
b) Vanillin
c) Thiamin

Answer :
a) Nicotine
CN1CCC[C@H]1c2cccnc2
b) Vanillin
O=Cc1ccc(O)c(OC)c1

c) Thiamin
OCCc1c(C)[n+](=cs1)Cc2cnc(C)nc(N)2

7.  Examples

Molecule Structure SMILES Formula
Dinitrogen N≡N N#N
Methyl isocyanate (MIC) CH3–N=C=O CN=C=O
Copper(II) sulfate Cu2+ SO42- [Cu+2].[O-]S(=O)(=O)[O-]

That is all what we have learned about SMILES, I would ask for apologize if there are some mistakes in the explanation.
Some of the information is based on Wikipedia and Lecture's Note.

Enjoy your reading and watch this video..




0 comments:

Post a Comment