Skip to content

Loading a Molecule


Introduction

To load a molecule, call a from_ method of Molecule.

For instance, from_smiles for a SMILES input, from_mol2 for a MOL2 input, and from_rdkit for an RDKit object. Choose the method that corresponds to your input type, or use the from_dynamic strategy if you're unsure or have a variety of input types.

Tip

from_dynamic is the simplest and most convenient method, but it may not always work! If you know your molecule's format, use the specific method for it.


Dynamic Loading

Dynamic loading examines your input and determines how to convert it into a Molecule object. It performs checks and then calls one of the methods detailed on this page.

from simmate.toolkit import Molecule

# try this with a filename, a smiles string, SDF string, rdkit object, ...
input_01 = "example_molecule.sdf"
input_02 = "example_molecule.csv" 
input_03 = "C1=CC(=C(C=C1CCN)O)O"

# The from_dynamic method will determine the format and convert it
for new_input in [input_01, input_02, input_03]
    molecule = Molecule.from_dynamic(new_input)

Note

from_dynamic also checks if we already have a Molecule object and returns it if we do.


Basic Loading

Files

File-based inputs accept a filename as a string or a pathlib.Path object.

from simmate.toolkit import Molecule

molecule = Molecule.from_sdf_file("example.sdf")
TYPE METHOD
(dynamic loading) from_file
CSV from_csv_file
SMILES (any type) from_smiles_file
SDF (aka CTAB) from_sdf_file
MOL2 from_mol2_file

Tip

Each of these methods has a corresponding submethod for loading this format directly from text/str, detailed in the section below. For instance, from_smiles takes a string, while from_smiles_file takes a .smi file.


Raw text / strings

You can read a python string variable directly. These methods are primarily used for testing and debugging.

from simmate.toolkit import Molecule

molecule = Molecule.from_smiles("C1=CC(=C(C=C1CCN)O)O")
TYPE METHOD
INCHI from_inchi
SMILES from_smiles
SMARTS from_smarts
SDF (aka CTAB) from_sdf
MOL2 from_mol2

Tip

Each of these methods has a corresponding submethod for loading this format directly from a file, detailed in the section above. For instance, from_smiles takes a string, while from_smiles_file takes a .smi file.


Python Objects

Methods are available to convert other popular python objects, such as those from RDKit

from simmate.toolkit import Molecule
from rdkit import Chem

# Smiles -> RDKit -> Simmate
rdkit_mol = Chem.MolFromSmiles("Cc1ccccc1")
molecule = Molecule.from_rdkit(rdkit_mol)
# !!! NOT RECOMMENDED !!!

# Smiles -> Simmate 
molecule = Molecule.from_smiles("Cc1ccccc1")
# !!! RECOMMENDED !!!
TYPE METHOD
RDKit Mol object from_rdkit
RDKit Mol object written as binary from_binary
Simmate (aka nothing needs to be done) from_dynamic
pathlib.Path see from_file section above

Database Entries

Warning

Loading from database metadata is still in progress. Refer to our guides on Python ORM to access datasets as Molecule objects quickly.

For example:

from simmate.database import connect
from simmate.datasets.models import CortevaCore

molecule_db = CortevaCore.objects.get(id=123)
molecule = molecule_db.to_toolkit()