Previous class of computer science was
our last class with Madam Noraslinda in this semester. As
students, time seems run in a rush. It felt we just had our first class two
weeks ago, but then it is already in mid semester and getting closer to
the final. During her last class, one of the lesson which we have learned was
about how to do SMILES.
1. Definition of SMILES
SMILE stands for Simplified Molecular Input Line Entry Specification. It is a line notation for the chemical structure of molecules. It uses a short series ASCII strings to represent structures. Most molecule editor computer programs can draw a two-dimensional diagram or a three-dimensional model of a molecule based on its SMILES code.
2. Graph-based definition
In
terms of a graph-based computational procedure, SMILES is a string
obtained by printing the symbol nodes encountered in a depth-first tree
traversal of a chemical graph. The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a spanning tree. Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes
Parentheses are used to indicate points of branching on the tree.
3. SMILES BONDS
There are four types of bonds, SMILES has symbols for each of them which are
Singel* -
Double =
Triple #
Aromatic* :
* can be omitted.
4.Atoms,
Atoms are represented by the standard abbreviation of the chemical elements, in square brackets, such as [Au] for gold.
Brackets can be omitted for the "organic subset" of B, C, N, O, P, S,
F, Cl, Br, and I. All other elements must be enclosed in brackets. If
the brackets are omitted, the proper number of implicit hydrogen atoms
is assumed; for instance the SMILES for water is simply O.
An
atom holding one or more electrical charges is enclosed in brackets,
followed by the symbol H if it is bonded to one or more atoms of
hydrogen, followed by the number of hydrogen atoms (as usual one is
omitted example: NH4 for ammonium),
then by the sign '+' for a positive charge or by '-' for a negative
charge. The number of charges is specified after the sign (except if
there is one only); however, it is also possible write the sign as many
times as the ion has charges: instead of "Ti+4", one can also write
"Ti++++" (Titanium IV, Ti4+). Thus, the hydroxide anion is represented by [OH-], the oxonium cation is [OH3+] and the cobalt III cation (Co3+) is either [Co+3] or [Co+++].
5. Branches are described with parentheses, as in CCC(=O)O for propionic acid and C(F)(F)F for. Substituted rings can be written with the branching point in the ring as illustrated by the SMILES COc(c1)cccc1C#N and COc(cc1)ccc1C#N
which encode the 3 and 4-cyanoanisole isomers. Writing SMILES for
substituted rings in this way can make them more human-readable.
6. Understanding SMILES Notation
To make a correct name of a linear chemical structure
- Firstly, we need to identify the structure, including the bonds in the structure, atoms and charges.
- Secondly, we determine the main chain of the structure, as we have to choose the longest chain one.
- Thirdly, identifying the branches
- Lastly, write down the notations regarding the chain that you have identified before. For linear structure, we are using capital C.
For Cyclic Structures
- Numbers indicate start and stop of ring
Break one single or one aromatic bond in each ring - Number in any order. Designate ring-breaking atoms by the same digit following the atomic symbol
- Same number indicates start and end of the ring, entered immediately following the start/end atoms
- Only numbers 1 –9 are used
- A number should appear only twice
- Atom can be associated w. 2 consecutive numbers, e.g., Napthalene: c12ccccc1cccc2
What is the SMILES code for:
a) Nicotine
b) Vanillin
c) Thiamin
Answer :
a) Nicotine
CN1CCC[C@H]1c2cccnc2
b) Vanillin
O=Cc1ccc(O)c(OC)c1
c) Thiamin
OCCc1c(C)[n+](=cs1)Cc2cnc(C)nc(N)2
7. Examples
Molecule | Structure | SMILES Formula |
---|---|---|
Dinitrogen | N≡N | N#N |
Methyl isocyanate (MIC) | CH3–N=C=O | CN=C=O |
Copper(II) sulfate | Cu2+ SO42- | [Cu+2].[O-]S(=O)(=O)[O-] |
That is all what we have learned about SMILES, I would ask for apologize if there are some mistakes in the explanation.
Some of the information is based on Wikipedia and Lecture's Note.
Enjoy your reading and watch this video..
Some of the information is based on Wikipedia and Lecture's Note.
Enjoy your reading and watch this video..