Skip to content

b) Building a database

Tip

Ensure you've read the previous section! Database setup can be complex, and most users can bypass it entirely.


Selecting your database engine

Simmate employs Django ORM for database construction and management, meaning any Django-supported database can be used with Simmate.

This encompasses PostgreSQL, MariaDB, MySQL, Oracle, SQLite, and others via third-party providers. noSQL databases like MongoDB are supported through djongo. Comprehensive documentation for Django databases is available here.

However, we strongly recommend opting for Postgres, which we discuss in the following section.

Warning

Our team utilizes SQLite (for local testing) and PostgreSQL (for production), so currently, we can only provide guidance on these two backends. You're free to use others, but be aware that we haven't extensively tested these backends and may not be able to assist with troubleshooting if issues occur.


Introduction to Postgres setup

PostgreSQL is free and open-source, allowing you to avoid costs and set it up manually.

However, using a database service such as DigitalOcean, Linode, GoogleCloud, AWS, Azure, or another provider is MUCH simpler. These providers set up the database for you through a user-friendly interface.

If you prefer to manually build a Postgres server, numerous tutorials and guides are available (1, 2, etc.). Be aware that this can be time-consuming and your final database connection may be slower if your team operates from multiple locations.


Setting up Postgres with DigitalOcean

Introduction & expected costs

Our team uses DigitalOcean, where the basic database server (~$15/month) is sufficient for Simmate usage. You'll only need >10GB if you're running >100,000 structure relaxations or frequently using unit cells with >1000 atoms.

(i) Account creation

Start by creating an account on DigitalOcean using this link (our referral). We suggest signing in with your Github account. This referral link provides:

  1. $100 credit for servers from DigitalOcean (valid for 60 days)
  2. $10 credit for the Simmate team from DigitalOcean, helping fund our servers

If you encounter any issues, please verify that DigitalOcean is still offering this deal here. Simmate is not affiliated with DigitalOcean.

(ii) Cloud database creation

  1. On the DigitalOcean dashboard, click the green "Create" button in the top right and select "Database". This should take you to this page.
  2. For "database engine", select the latest version of PostgreSQL (currently v14)
  3. Leave the rest of the page's options at their default values.
  4. Click Create a Database Cluster when ready.
  5. On your new cluster's homepage, there's a "Get Started" button. We'll go through this dialog in the next section.

Note, this is the database cluster, which can host multiple databases (each with their own tables).

(iii) Database connection

Before setting up our database on this cluster, we'll first try connecting to the default database on it (named defaultdb).

  1. On your new database's page, you'll see a "Getting Started" dialog -- select it!
  2. "Restrict inbound connections" is optional and beginners should skip it for now. We skip this because if you're running calculations on a supercomputer/cluster, you'll need to add ALL the associated IP addresses for connections to work properly. That's a lot of IP addresses to collect and configure correctly -- so we leave this to advanced users.
  3. "Connection details" is the information we need to provide to Simmate/Django. Let's copy this information. For example, here's what the details look like on DigitalOcean:
    username = doadmin
    password = asd87a9sd867fasd
    host = db-postgresql-nyc3-49797-do-user-8843535-0.b.db.ondigitalocean.com
    port = 25060
    database = defaultdb
    sslmode = require
    
  4. In your Simmate Python environment, ensure you have the Postgres engine installed. The package is psycopg2, which allows Django to communicate with Postgres. To install this, run the command:

    conda install -n my_env -c conda-forge psycopg2
    

  5. We need to pass this information to Simmate (which connects using Django). To do this, add a file named my_env-settings.yaml (using your conda env name) to your Simmate config directory (~/simmate) with the following content -- make sure to substitute in your connection information:

    database:
      engine: django.db.backends.postgresql
      host: db-postgresql-nyc3-49797-do-user-8843535-0.b.db.ondigitalocean.com
      name: defaultdb
      user: doadmin
      password: asd87a9sd867fasd
      port: 25060
      options:
        sslmode: require
    

  6. Verify that you can connect to this database on your local computer by running the following in Spyder:
    from simmate.configuration import settings
    
    print(settings.database)  # this should display your connect info!
    

(iv) Creating a separate database for testing (on the same server)

Just as we don't use the (base) environment in Anaconda, we don't want to use the default database defaultdb on our cluster. Here we'll create a new database -- one that we can delete if we want to start over.

  1. On DigitalOcean with your Database Cluster page, select the "Users&Databases" tab.
  2. Create a new database using the "Add new database" button and name it simmate-database-00. We name it this way because you may want to create new/separate databases and numbering is a quick way to keep track of these.
  3. In your connection settings (from the section above), switch the NAME from defaultdb to simmate-database-00. You will change this in your my_env-database.yaml file.

(v) Building our database tables

Now that we've set up and connected to our database, we can create our Simmate database tables and start populating them with data! We do this the same way we did without a cloud database:

  1. In your terminal, ensure you have your Simmate environment activated
  2. Run the following command:
    simmate database reset
    
  3. You're now ready to start using Simmate with your new database!

(vi) Creating a connection pool

When we have multiple calculations running simultaneously, we need to ensure our database can handle all these connections. Therefore, we create a connection pool which allows for thousands of connections. This "pool" operates like a waitlist where the database handles each connection request in sequence.

  1. Select the "Connection Pools" tab and then "Create a Connection Pool"
  2. Name your pool simmate-database-00-pool and select simmate-database-00 for the database
  3. Select "Transaction" for our mode (the default) and set our pool size to 10 (or adjust this value as needed)
  4. Create the pool when ready!
  5. You'll need to update your my_env-settings.yaml file to these connection settings. At this point your file will look similar to this (note, our NAME and PORT values have changed):
    database:
      engine: django.db.backends.postgresql
      host: db-postgresql-nyc3-49797-do-user-8843535-0.b.db.ondigitalocean.com
      name: simmate-database-00-pool  # THIS LINE WAS UPDATED
      user: doadmin
      password: asd87a9sd867fasd
      port: 25061
      options:
        sslmode: require
    

Warning

Calling simmate database reset when using a connection pool will NOT work! If you ever need to reset your database, ensure you connect to the database directly instead of through a database pool.

(vii) Loading third-party data

This step is optional.

With Sqlite, we could download a prebuilt database with data from third parties already in it. However, creating our Postgres database means our database is entirely empty.

To load ALL third-party data (~10GB total), you can use the following command. We can also use Dask to run this in parallel and speed things up. Depending on your internet connection and CPU speed, this can take up to 24 hours.

simmate database load-remote-archives --parallel

Warning

--parallel will use all cores on your CPU. Keep this in mind if you are running other programs/calculations on your computer already.

(viii) Sharing the database

If you want to share this database with others, they simply need to copy your config file: my_env-settings.yaml. They won't need to run simmate database reset because you did it for them.