Chemistry - Is there a way to use free software to convert SMILES strings to structures?

Solution 1:

According to the website, Open Babel should do the trick: Documentation - SMILES, Sourceforge.

For example, the following code will give you a neat SVG file of the molecule benzene:

obabel -:"c1ccccc1" -O benzen.svg

If you experience problems using it, you are welcome to ask more specifically.

Alternatively, you can use a web-query from the national cancer institute. It is easily accessible by the following code

http://cactus.nci.nih.gov/chemical/structure/"structure identifier"/"representation"

For example: benzene, "structure identifier"=c1ccccc1, "representation"=image.

Another open source solution, where you can directly export the structure into a molecular editor is Avogadro. (It uses Open Babel though.)

Depending on the actual problem, however, there might already be more advanced routines.

Solution 2:

In addition to the other good answers, I'd recommend rdkit, an open-source, freely available software for chemoinformatics. Most people use rdkit via its Python interface.

Here are some rdkit basics:

  1. The code base is available in GitHub, here.
  2. The license is quite permissive; you don't need to worry about what type of work (commercial, personal, or academic) you are doing.
  3. The Python API makes using rdkit easy, but all the core functions are written C++, making it fast and efficient. The Python API provides access to these functions in Python, making it flexible and easy to learn. If you happen to be fluent in C++, a C++ API is available.
  4. It does a whole lot more than convert SMILES to structures; see some examples here.

Here is one way to convert a SMILES to a structure in rdkit.

from rdkit import Chem
from rdkit.Chem import Draw

import matplotlib.pyplot as plt
%matplotlib inline

penicillin_g_smiles = 'CC1([[email protected]@H](N2[[email protected]](S1)[[email protected]@H](C2=O)NC(=O)Cc3ccccc3)C(=O)O)C'

penicillin_g = Chem.MolFromSmiles(penicillin_g_smiles)

Draw.MolToMPL(penicillin_g, size=(200, 200))

Here's a picture of the code and the resulting image. Penicillin G code


Solution 3:

For those who want to convert a few SMILES strings to images, you can also use the CDK 1.5-based Depict utility from John May (www.simolecule.com/cdkdepict/, GitHub). It provides various options and outputs Scalable Vector Graphics (which can be easily converted into other formats).

For example, caffeine with title: https://www.simolecule.com/cdkdepict/depict/bow/svg?smi=CN1C%3DNC2%3DC1C(%3DO)N(C(%3DO)N2C)C%20caffeine&abbr=on&hdisp=bridgehead&showtitle=true&zoom=1.6&annotate=none

2D caffeine structure representation converted from SMILES

Thus, with the basic web API you can create a script to convert all SMILES strings too, e.g. using the RCurl package. This StackOverflow post explains how you convert the SVG to other formats.

However, since you probably prefer a pure R-based solution, please do have a look at the rcdk package.


Solution 4:

I'm surprised that you've had difficulty finding a toolkit - is it that the licence must be MIT or as permissive? I guess that you will be using this in software you are making, rather than a one-off data conversion?

For example, OpenBabel (C++), Chemistry Development Kit (Java), etc - in addition, the CDK can interface with R - would seem to suit your needs?

Tags:

Molecules