SMARTS Sets¶
Overview
¶
SMARTS queries are used for substructure searches and identifying functional groups. This module offers "sets" of common substructures and functional groups, which can be used to label molecules.
The SmartsSet class manages the search and labeling process for input molecules.
Available Sets
¶
Two default SMARTS sets are available:
The ChemblAlerts set includes several popular subsets:
- Glaxo
- Dundee
- BMS
- PAINS
- MLSMR
Basic Usage
¶
Accessing a Set's Data¶
You can access the original raw data (from a CSV file) and a column for SMARTS Molecule objects using the smarts_data and smarts_dict class properties.
For instance, using ChemblAlerts:
from simmate.toolkit.smarts_sets import ChemblAlerts
# option 1 (as pandas.DataFrame)
data = ChemblAlerts.smarts_data
# option 2 (as dict)
data = ChemblAlerts.smarts_dict
Counting Matches¶
To obtain the exact count of each functional group, use get_counts:
from simmate.toolkit.smarts_sets import ChemblAlerts
matches = ChemblAlerts.get_counts(
molecule,
include_misses=False,
)
Listing Matches¶
If you only need a list of the SMARTS that matched, rather than exact counts, use get_matches:
from simmate.toolkit.smarts_sets import ChemblAlerts
matches = ChemblAlerts.get_matches(molecule)
Parallelization
¶
Warning
The parallelization method for this class is not yet available. Please contact us if you require this feature.
Adding a New Set¶
To add a new SMARTS set, you need to:
- Create a csv file with
nameandsmarts_strcolumns (you can add more columns if needed) - Inherit from the
SmartsSetbase class - Specify the location of the CSV file with the
source_fileattribute
The SmartsSet will then manage the behavior of get_counts and other features.
For example:
from simmate.toolkit.smarts_sets.base import SmartsSet
class CustomAlerts(SmartsSet):
source_file = "path/to/custom_alerts.csv"