Skip to content

Basic Workflow Use


List Available Workflows

simmate workflows list-all
from simmate.workflows.utilities import get_all_workflow_names

names = get_all_workflow_names()

There are several more tools in simmate.workflow.utilities to help explore:

utility name
get_all_workflows
get_all_workflow_names
get_all_workflow_types
get_apps_by_type
get_workflow_names_by_type

Load a Workflow

"Loading" a workflow only applies in python. Use the get_workflow method, which will return the requested Workflow subclass:

from simmate.workflows.utilities import get_workflow

workflow_name = "static-energy.vasp.matproj"
workflow = get_workflow(workflow_name)

View Parameters & Options

For detailed information about a specific workflow's parameters:

simmate workflows explore
workflow.show_parameters()

There are several properties & methods available for all Workflow subclasses:

property/method name
show_parameters()
parameter_names
parameter_names_required
parameter_defaults

Tip

We've dedicated a entire section of our documentation to workflow parameters. Please familiarize yourself with this section for detailed parameter descriptions and examples.


Run a Workflow (Local)

To execute a workflow on your local machine, use the run approach:

# in example.yaml
workflow_name: relaxation.vasp.matproj
structure: NaCl.cif
command: mpirun -n 8 vasp_std > vasp.out
simmate workflows run example.yaml
simmate workflows run-quick relaxation.vasp.matproj --structure NaCl.cif
# in example.toml
workflow_name = "relaxation.vasp.matproj"
structure = "NaCl.cif"
command = "mpirun -n 8 vasp_std > vasp.out"
simmate workflows run example.toml
from simmate.workflows.utilities import get_workflow

workflow = get_workflow("relaxation.vasp.matproj")
status = workflow.run(structure="NaCl.cif")
result = status.result()
https://simmate.org/workflows/static-energy/vasp/matproj/submit

Run a Workflow (Cloud)

Workflows can also be executed on a remote cluster. It's important to understand the differences between local and cloud runs:

graph TD
  A[submit with 'run' command] --> B[starts directly on your local computer & right away];
graph TD
  A[submit with 'run-cloud' command] --> B[adds job to scheduler queue];
  B --> C[waits for a worker to pick up job];
  C --> D[worker selects job from queue];
  D --> E[runs the job where the worker is];
  F[launch a worker with 'start-worker' command] --> D;

To schedule a workflow to run on a remote cluster, ensure your computational resources are configured. Then, use the run_cloud method:

# in example.yaml
workflow_name: static-energy.vasp.matproj
structure: NaCl.cif
command: mpirun -n 4 vasp_std > vasp.out
simmate workflows run-cloud example.yaml

from simmate.workflows.utilities import get_workflow

workflow = get_workflow("static-energy.vasp.matproj")

status = workflow.run_cloud(
    structure="NaCl.cif", 
    command="mpirun -n 4 vasp_std > vasp.out",
)

result = state.result() # (1)
  1. This will block and wait for the job to finish

Warning

The run-cloud command/method only schedules the workflow. It won't run until you add computational resources (or Workers). To do this, you must read through the "Computational Resources" documentation.


View Workflow Results

Option 1: Output Files

Navigate to the directory where the calculation was run to find output files (if any). Some of these include:

  • simmate_metadata.yaml: original input parameters for the workflow run
  • simmate_summary.yaml: a summary of information that is saved to the database
  • simmate_corrections.csv: lists the errors encountered (if any) and how they were resolved
  • others: for example, relaxation & electronic-structure will output plots

Tip

While the plots and summary files are useful for quick viewing, there is much more information available in the database.

Option 2: Python Objects

Access the result directly in python. Workflows can return any - however, workflows that save to a database table will return the actual database object.

status = workflow.run(...)
result = state.result()  # (1)
  1. Returns a Database object. In some cases, you can convert to a toolkit structure using result.to_toolkit()

For viewing the results of many workflow runs:

results = workflow.all_results  # (1)
  1. This takes the relevent table (e.g. StaticEnergy) and filters down to all results matching this workflow name.

Tip

View the Database guides for advanced filtering and data manipulation.

Option 3: The Database

You can view the data directly via SQL. For example:

SELECT *
FROM workflows_staticenergy
WHERE workflow_name = 'static-energy.vasp.mit'

Tip

We recommend exploring database tables using DBeaver

Option 4: The Website Server

Warning

this is an experimental feature and still in early development

In the simmate_summary.yaml output file, there is the _WEBSITE_URL_. You can copy/paste this URL into your browser and view your results in an interactive format. Just make sure you are running your local server first:

simmate run-server

Then open the link given by _WEBSITE_URL_:

http://127.0.0.1:8000/workflows/static-energy/vasp/mit/1

Run Massively Parallel Workflows

Some workflows submit many subworkflows. For example, evolutionary structure prediction does this by submitting hundreds of individual structure relaxations, analyzing the results, and submitting new structures based on the results.

This is achieved by the workflow manually calling run-cloud on others. If you start multiple workers elsewhere, you can calculate these subworkflows in parallel:

graph TD
  A[main workflow];
  A --> B[subworkflow];
  B --> C[schedule run 1] --> G[scheduler];
  B --> D[schedule run 2] --> G;
  B --> E[schedule run 3] --> G;
  B --> F[schedule run 4] --> G;
  G --> H[worker 1];
  G --> I[worker 2];
  G --> J[worker 3];

To run these types of workflows, you must:

  1. Start the main workflow with the run command
  2. Start at least one worker that will run the submitted calculations

Note

The number of workers will determine how many jobs are run in parallel -- and this is only limited by the number of jobs queued. For example, if I submit 500 workflows with run-cloud but only start 100 workers, then only 100 workflows will be run at a time. Further, if I submit 25 workflows but have 100 workers, then that means 75 of our workflows will be sitting idle without any job to run.