Chemistry - What databases or repositories are there that have generalized SMILES for functional groups?

I'm currently working on a content store for a resource like this because I've realized I need this as well (and came across your question ~2 weeks ago).

Content Vairable Store

I am writing a python package called global-chem to actively support variable common storage (especially for SMILES/SMARTS) strings that can be easily distributed via pip.

https://github.com/Sulstice/global-chem

The code is fairly simple and can be a one-liner:

from global_chem import GlobalChem
functional_groups = GlobalChem().functional_groups_smiles

GlobalChem is a class and properties on that class correspond to different sets of SMILES/SMARTS strings.

As of right now, I have

  • 93 functional group for SMILES
  • 85 functional groups for SMARTS
  • 19 Amino Acid SMILES strings

I suspect it will grow more over time and probably as I, and others, contribute to it.

If you want to read the docs: https://globalchem.readthedocs.io/en/latest/?badge=latest

Validation

Since these strings can often be hard to validate I've been using a blend of MolVS and RDKit to make sure the strings will match as expected. It probably needs to be also manually tested each one and verified so use at your own discretion while I get some guard rails up.