SMARTS Sets¶
Overview ¶
SMARTS queries are used for substructure searches and identifying functional groups. This module offers "sets" of common substructures and functional groups, which can be used to label molecules.
The SmartsSet
class manages the search and labeling process for input molecules.
Available Sets ¶
Two default SMARTS sets are available:
The ChemblAlerts
set includes several popular subsets:
- Glaxo
- Dundee
- BMS
- PAINS
- MLSMR
Basic Usage ¶
Accessing a Set's Data¶
You can access the original raw data (from a CSV file) and a column for SMARTS Molecule
objects using the smarts_data
and smarts_dict
class properties.
For instance, using ChemblAlerts
:
from simmate.toolkit.smarts_sets import ChemblAlerts
# option 1 (as pandas.DataFrame)
data = ChemblAlerts.smarts_data
# option 2 (as dict)
data = ChemblAlerts.smarts_dict
Counting Matches¶
To obtain the exact count of each functional group, use get_counts
:
from simmate.toolkit.smarts_sets import ChemblAlerts
matches = ChemblAlerts.get_counts(
molecule,
include_misses=False,
)
Listing Matches¶
If you only need a list of the SMARTS that matched, rather than exact counts, use get_matches
:
from simmate.toolkit.smarts_sets import ChemblAlerts
matches = ChemblAlerts.get_matches(molecule)
Parallelization ¶
Warning
The parallelization method for this class is not yet available. Please contact us if you require this feature.
Adding a New Set¶
To add a new SMARTS set, you need to:
- Create a csv file with
name
andsmarts_str
columns (you can add more columns if needed) - Inherit from the
SmartsSet
base class - Specify the location of the CSV file with the
source_file
attribute
The SmartsSet
will then manage the behavior of get_counts
and other features.
For example:
from simmate.toolkit.smarts_sets.base import SmartsSet
class CustomAlerts(SmartsSet):
source_file = "path/to/custom_alerts.csv"