Friday 8 November 2013

First Lesson about SMILES

Previous class of computer science was our last class with Madam Noraslinda in this semester. As students, time seems run in a rush. It felt we just had our first class two weeks ago, but then it is already in mid semester and getting closer to the final. During her last class, one of the lesson which we have learned was about how to do SMILES.

1. Definition of SMILES
SMILE stands for Simplified Molecular Input Line Entry Specification. It is a line notation for the chemical structure of molecules. It uses a short series ASCII strings to represent structures. Most molecule editor computer programs can draw a two-dimensional diagram or a three-dimensional model of a molecule based on its SMILES code.

2. Graph-based definition
 In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree traversal of a chemical graph. The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a spanning tree. Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes
Parentheses are used to indicate points of branching on the tree.

3. SMILES BONDS
There are four types of bonds, SMILES has symbols for each of them which are 
Singel*  -
Double  = 
Triple    #
Aromatic* :

* can be omitted.


4.Atoms,
Atoms are represented by the standard abbreviation of the chemical elements, in square brackets, such as [Au] for gold. Brackets can be omitted for the "organic subset" of B, C, N, O, P, S, F, Cl, Br, and I. All other elements must be enclosed in brackets. If the brackets are omitted, the proper number of implicit hydrogen atoms is assumed; for instance the SMILES for water is simply O.
An atom holding one or more electrical charges is enclosed in brackets, followed by the symbol H if it is bonded to one or more atoms of hydrogen, followed by the number of hydrogen atoms (as usual one is omitted example: NH4 for ammonium), then by the sign '+' for a positive charge or by '-' for a negative charge. The number of charges is specified after the sign (except if there is one only); however, it is also possible write the sign as many times as the ion has charges: instead of "Ti+4", one can also write "Ti++++" (Titanium IV, Ti4+). Thus, the hydroxide anion is represented by [OH-], the oxonium cation is [OH3+] and the cobalt III cation (Co3+) is either [Co+3] or [Co+++].

5. Branches are described with parentheses, as in CCC(=O)O for propionic acid and C(F)(F)F for. Substituted rings can be written with the branching point in the ring as illustrated by the SMILES COc(c1)cccc1C#N and COc(cc1)ccc1C#N which encode the 3 and 4-cyanoanisole isomers. Writing SMILES for substituted rings in this way can make them more human-readable.

 6. Understanding SMILES Notation
To make a correct name of a linear chemical structure
  •  Firstly, we need to identify the structure, including the bonds in the structure, atoms and charges.
  • Secondly, we determine the main chain of the structure, as we have to choose the longest chain one.
  • Thirdly, identifying the branches
  • Lastly, write down the notations regarding the chain that you have identified before. For linear structure, we are using capital C. 

For Cyclic Structures
  • Numbers indicate start and stop of ring
    Break one single or one aromatic bond in each ring
  • Number in any order. Designate ring-breaking atoms by the same digit following the atomic symbol
  • Same number indicates start and end of the ring, entered immediately following the start/end atoms
  • Only numbers 1 –9 are used
  • A number should appear only twice
  • Atom can be associated w. 2 consecutive numbers, e.g., Napthalene: c12ccccc1cccc2


What is the SMILES code for:
a) Nicotine
b) Vanillin
c) Thiamin

Answer :
a) Nicotine
CN1CCC[C@H]1c2cccnc2
b) Vanillin
O=Cc1ccc(O)c(OC)c1

c) Thiamin
OCCc1c(C)[n+](=cs1)Cc2cnc(C)nc(N)2

7.  Examples

Molecule Structure SMILES Formula
Dinitrogen N≡N N#N
Methyl isocyanate (MIC) CH3–N=C=O CN=C=O
Copper(II) sulfate Cu2+ SO42- [Cu+2].[O-]S(=O)(=O)[O-]

That is all what we have learned about SMILES, I would ask for apologize if there are some mistakes in the explanation.
Some of the information is based on Wikipedia and Lecture's Note.

Enjoy your reading and watch this video..




Thursday 7 November 2013

A Lesson On XMLs


XML stands for Extensible Markup Language.
It is a markup language much like HTML (click here!) however it does not replace the latter. Rather, they complement each other.
In most web applications, XML is used to transport data while HTML is used to format and display the data, with focus on how the data looks. XML was created specifically to just structure, store, and transport the information.
With XML, your data can be available to all kinds of "reading machines" (Handheld computers, voice machines, news feeds, etc), and makes it more available for blind people, or people with other disabilities.



How To Write Your Own XML Document
An example of an XML document:


XML Document (Tree Structure)
XML documents form a tree structure that starts at "the root" and branches to "the leaves".
The tree starts at the root and branches to the lowest level of the tree.
All elements can have sub elements (child elements):
<root>
  <child>
    <subchild>.....</subchild>
  </child>
</root>
Children on the same level are called siblings (brothers or sisters).
All elements can have text content and attributes (just like in HTML).
The image above represents one book in the XML document below:
<bookstore>
  <book category="FICTION">
    <title lang="en">Angels And Demons</title>
    <author>Dan Brown</author>
    <year>2001</year>
    <price>35.00</price>
  </book>
  <book category="RELIGIOUS">
    <title lang="en">Most Common Questions Asked By Non Muslims</title>
    <author>Dr. Zakir Naik</author>
    <year>2012</year>
    <price>11.00</price>
  </book>
  <book category="MATHEMATICS">
    <title lang="en">50 Mathematical Ideas</title>
    <author>Tony Crilly</author>
    <year>2007</year>
    <price>50.00</price>
  </book>
</bookstore>

Rules in XML
There are, however, some rules to follow when writing your XML documents.
No. Rules
1. All elements must have a closing tag.

E.g. <p>This is a paragraph</p>

        <p>This is another paragraph</p>

2. Opening and closing tags must be written with the same case.

E.g. <Message>This is incorrect</message>

        <message>This is correct</message>

3. All elements must be properly nested within each other.

E.g. <b><i>This text is bold and italic</i></b>


4. Must contain one element that is the parent of all other elements.

5. Attribute values must always be quoted.

E.g. <note date="12/11/2007">

           <to>Michael</to>
           <from>John</from>
       </note>

6.Replace a character like "<" inside an XML element with an entity reference.

E.g. <message>if salary < 1000</message> => Error

       <message>if salary  &lt;  1000</message> => Correct

There are 5 predefined entity references in XML.




We hope that you have gained more knowledge about the XML from this post, and will try to make your very own. Be creative and if writing codes is outside of your comfort zone, don’t give up! You can do it!
“Don’t be afraid if things seem difficult in the beginning. That’s only the initial impression. The important thing is not to retreat; you have to master yourself.”
- Olga Korbut
If you would like to get your hands on more information on XML, please find a link and a video located below:

Enjoy!