Basic Database Access¶
Outline of all steps¶
Accessing and analyzing data typically involves the following steps:
- Connect to your database
- Load a specific database table
- Filter data
- Convert data to a desired format
- Modify data via
The sections below will guide you on performing each step. But to place everything up-front, your final script may look like this:
# Connect to your database from simmate.database import connect # Load a specific database table from simmate.database.third_parties import MatprojStructure # Filter data results = MatprojStructure.objects.filter( nsites=3, is_gap_direct=False, spacegroup=166, ).all() # Convert data to a desired format structures = results.to_toolkit() dataframe = results.to_dataframe() # Modify data for structure in structures: # run your anaylsis/modifications here!
Connect to your database¶
For interactive use, Django settings must be configured before any of these submodules can be imported. This can be done with...
# connect to the database from simmate.database import connect # and now you can import tables in this module from simmate.database.workflow_results import MITStaticEnergy
connect step is not done, you will recieve the following error:
ImproperlyConfigured: Requested setting INSTALLED_APPS, but settings are not configured. You must either define the environment variable DJANGO_SETTINGS_MODULE or call settings.configure() before accessing settings.
Load your database table¶
The name of your table will depend on the source you're trying to access. To see the available sources (Materials Project, OQMD, Jarvis, COD), you can explore the contents the database/third_parties module.
Using Materials Project as an example, we can load the table using...
from simmate.database.third_parties import MatprojStructure
Alternatively, if you intend to access data from a specific workflow, there are two methods to access the table. in addition to loading from the
workflow_results module, most workflows have a
database_table attribute that let you access the table as well:
########## METHOD 1 ######## from simmate.workflows.static_energy import mit_workflow table = mit_workflow.database_table ######## METHOD 2 ######## from simmate.database import connect from simmate.database.workflow_results import MITStaticEnergy # The line below shows that these tables are the same! Therfore, use # whichever method you prefer. assert table == MITStaticEnergy
Query and filter data¶
To query a table, Simmate inherits methods from Django, which is a web framework for quering massive datasets. It is powerful and efficient, and is therefore used to deliver data to many familiar websites, such as Instagram and Spotify. The key feature of Django that we use is its Object-Relational Mapper (ORM). The ORM allows us to use a simple language for making complex queries to our database. Below, we show some common queries. A full description of all query methods is discussed on Django's query page.
All rows of the database table are available via the
All columns of the database table can be printed via the
To filter rows with exact-value matches in a column:
MITStaticEnergy.objects.filter( nsites=3, is_gap_direct=False, spacegroup=166, ).all()
To filter rows based on conditions, chain the column name with two underscores. Conditions supported are listed here, but the most commonly used ones are:
contains= contains text, case-sensitive query
icontains= contains text, case-insensitive query
gt= greater than
gte= greater than or equal to
lt= less than
lte= less than or equal to
range= provides upper and lower bound of values
Trueif the entry does not exist
An example query with conditional filters:
MITStaticEnergy.objects.filter( nsites__gte=3, # greater or equal to 3 sites energy__isnull=False, # the structure DOES have an energy density__range=(1,5), # density is between 1 and 5 elements__icontains='"C"', # the structure includes the element Carbon spacegroup__number=167, # the spacegroup number is 167 ).all()
Note, for the filtering condition
elements__icontains, we used some odd quotations when querying for carbon:
'"C"'. This is not a typo! The quotes ensure we don't accidentally grab Ca, Cs, Ce, Cl, and so on. This is an issue when you are using SQLite (the default datbase backend). If you are using Postgres, this line can change to the cleaner version
Convert data to desired format¶
By default, Django returns your query results as a
SearchResults in simmate). This is a list of database objects. It is more useful to convert them to a pandas dataframe or to toolkit objects.
# Gives a pandas dataframe. df = MITStaticEnergy.objects.filter(...).to_dataframe() # Gives a list of toolkit Structure objects df = MITStaticEnergy.objects.filter(...).to_toolkit() # '...' are the set of filters selected from above.
To modify and analyze data, see the pandas and
simmate.toolkit documentation for more info.