Skip to content

Third-party database access

This module downloads data from third-parties and stores it to your local database.

This data is NOT from the Simmate team. These providers are independent groups, and you should cite them appropriately. All data remain under the corresponding provider's terms and conditions.

Currently, we support the following providers:

These providers are configured, but our team is waiting for permission to redistribute their data:

Tip

If your team would like to make your own data available via Simmate, please see the Contributing data module. Even if it is a single table, don't hesistate to make a contribution! We outline the benefits of contributing and how to package your data within the for_providers module.


Downloading data

Make sure you have completed our introductory tutorial for downloading data from these providers. Below we show example usage with MatprojStructure, but the same process can be done with all other tables in this module.

WARNING: The first time you load archives of data, it can take a long time, so we recommend running some things overnight. Once completed, we also recommend backing up your database (by making a copy of your ~/simmate/my_env-database.sqlite3 file). This ensures you don't have to repeat this long process.

To download all data into your database:

simmate database load-remote-archives

Or in python, you can download a specific table:

from simmate.database.third_parties import MatprojStructure

# This can take >1 hour for some providers. Optionally, you can
# add `parallel=True` to speed up this process, but use caution when 
# parallelizing with SQLite (the default backend). We recommend 
# avoiding the use of parallel=True, and instead running
# this line overnight.
MatprojStructure.load_remote_archive()

# If you use this providers data, be sure to cite them!
MatprojStructure.source_doi

Populating energy fields

Some database providers give a calculated energy, which can be used to populate stability information:

# updates ALL chemical systems.
# Note, this can take over an hour for some providers. Try running 
# this overnight along with your call to load_remote_archive.
MatprojStructure.update_all_stabilities()

# updates ONE chemical system
# This can be used if you quickly want to update a specific system
MatprojStructure.update_chemical_system_stabilities("Y-C-F")

Alternatives

This module can be viewed as an alternative to and/or an extension of the following codes:

This module stores data locally and then allows rapidly loading data to memory, whereas alternatives involve querying external APIs and loading data into memory. We choose to store data locally because it allows stability (i.e. no breaking changes in your source data) and fast loading accross python sessions. This is particullary useful for high-throughput studies.