Skip to content

Creating Nested Workflows


Overview

Because workflows can contain any python code, they can also make calls to other workflows -- either via run or run_cloud methods.

This enables calling one or serveral workflows in succession -- or even submitting them to the cluster if you have many analyses to run.


Calling one workflow repeatedly

You can use the run method of a workflow within another workflow and call it as much as you'd like.

from simmate.workflows.utilities import get_workflow
from simmate.engine import Workflow

class Example__Python__MyFavoriteSettings(Workflow):

    use_database = False  # we don't have a database table yet

    @staticmethod
    def run_config(structure, **kwargs):

        # you can grab a workflow locally, attach one as a class
        # attribute, or anything else possible with python
        another_workflow = get_workflow("static-energy.vasp.mit")

        # And run the workflow how you would like. Here, we are
        # just running the workflow 10 times in row on different
        # perturbations or "rattling" of the original structure
        for n in range(10):
            structure.perturb(0.05)  # modifies in-place
            state = another_workflow.run(structure=structure)
            result = state.result()
            # ... do something with the result

Note

Notice that we are calling state.result() just like we would a normal workflow run. Usage is exactly the same.


Calling multiple workflows

You can also call a series of workflows on an input. Again, any python will be accepted within the run_config so workflow usage does not change:

from simmate.workflows.utilities import get_workflow
from simmate.engine import Workflow

class Example__Python__MyFavoriteSettings(Workflow):

    use_database = False

    @staticmethod
    def run_config(structure, directory, **kwargs):

        subworkflow_1 = get_workflow("static-energy.vasp.mit")
        subworkflow_1.run(structure=structure)

        subworkflow_2 = get_workflow("population-analysis.vasp.elf-matproj")
        subworkflow_2.run(structure=structure)

        subworkflow_3 = get_workflow("electronic-structure.vasp.matproj-full")
        subworkflow_3.run(structure=structure)      

Writing all runs to a shared directory

When using run, you often want workflows to share a working directory, so that you can find the results all in one place.

To do this, we simply need to set the directory manually for each subworkflow run:

from simmate.workflows.utilities import get_workflow
from simmate.engine import Workflow

class Example__Python__MyFavoriteSettings(Workflow):

    use_database = False

    @staticmethod
    def run_config(structure, directory, **kwargs):  # <-- uses directory as an input
        another_workflow = get_workflow("static-energy.vasp.mit")
        for n in range(10):
            structure.perturb(0.05)

            # make sure the directory name is unique
            subdirectory = directory / f"perturb_number_{n}"

            another_workflow.run(
                structure=structure,
                directory=subdirectory, # <-- creates a subdirectory for this run
            )

Tip

Also see writing output files

Danger

when using run_cloud you should NOT share a working directory. This causes problems when you have computational resource scattered accross different computers & file systems. See github #237.


Passing results between runs

When you grab the result from one subworkflow, you can interact with that database object to pass the results to the next subworkflow.

from simmate.workflows.utilities import get_workflow
from simmate.engine import Workflow

class Example__Python__MyFavoriteSettings(Workflow):

    use_database = False

    @staticmethod
    def run_config(structure, directory, **kwargs):

        subworkflow_1 = get_workflow("relaxation.vasp.mit")
        state_1 = subworkflow_1.run(structure=structure)
        result_1 = state_1.result()

        # When passing structures, we can directly use the result. This is
        # because the 'structure' parameter accepts database objects as input.
        subworkflow_2 = get_workflow("static-energy.vasp.mit")
        state_2 = subworkflow_2.run(
            structure=result_1,  # use the final structure of the last calc
        )
        result_2 = state_2.result()

        # Alternatively, you may want to mutate or analyze the result in 
        # some way before submitting a new calculations
        if result_2.energy_per_atom > 0:
            print("Structure is very unstable even after relaxing!")
            # maybe the atoms are too close, so let's increase the volume by 20%
            structure_new = result_2.to_toolkit()
            structure_new.scale_lattice(
                volume=structure.volume * 1.2,
            )
            # and try the workflow again
            state_2 = subworkflow_2.run(
                structure=structure_new,  # use the modified structure
            )

Submitting parallel workflows

Sometimes, we don't want to pause and wait for each workflow run to finish. There are even cases where we would submit hundreds of workflow runs that are indpendent and can run in parallel.

To do this, we can use the run_cloud command instead of calling run.

from simmate.workflows.utilities import get_workflow
from simmate.engine import Workflow

class Example__Python__MyFavoriteSettings(Workflow):

    use_database = False  # we don't have a database table yet

    @staticmethod
    def run_config(structure, **kwargs):

        another_workflow = get_workflow("static-energy.vasp.mit")

        # keep track of the runs we submit
        submitted_states = []

        for n in range(10):
            structure.perturb(0.05)  # modifies in-place

            # submit to cloud instead of running locally
            state = another_workflow.run_cloud(structure=structure)

            # add the state to our list
            submitted_states.append(state)

            # do NOT call result yet! This will block and wait for this 
            # calculation to finish before continuing
            # state.result()

        # now wait for all the calculations to finish and grab the results
        results = [state.result() for state in submitted_states]

        # And workup the results as you see fit
        for result in results:
            print(result.energy_per_atom)

Danger

when using run_cloud you should NOT share a working directory. This causes problems when you have computational resource scattered accross different computers & file systems. See github #237.

Tip

Using state.result() to wait for each result is optional too -- you decide when to call it (if at all). You can even have a workflow that just submits runs and then shuts down -- without ever waiting on the results.