Building a database
Make sure you have read the previous section! Setting up a database can be tricky, and the majority of users can avoid it altogether.
Choosing your database engine¶
Simmate uses Django ORM to build and manage its database, so any Django-supported database can be used with Simmate.
This includes PostgreSQL, MariaDB, MySQL, Oracle, SQLite, and others through third-parties. noSQL databases like MongoDB are supported through djongo. The full documentation for django databases is available here.
However, we strongly recommended choosing Postgres, which we cover in the next section.
Our team uses SQLite (for local testing) and PostgreSQL (for production), so at the moment, we can only offer guidance on these two backends. You are welcome to use the others, but be wary that we haven't thuroughly tested these backends and won't be able to help you troubleshoot if errors arise.
Intro to Postgres set up¶
PostgreSQL is free and open-source, so you can avoid costs and set it up manually.
However, it's MUCH easier to use a database service such as DigitalOcean, Linode, GoogleCloud, AWS, Azure, or another provider. These providers set up the database for you through a nice user-interface.
If you still want to manually build a postgres server, there are many tutorials and guides available on how to do this (1, 2, etc.). Just be aware that this is can take a lot of time AND your final database connection may be slower if your team works accross multiple locations.
Setup Postgres with DigitalOcean¶
Intro & expected costs¶
Our team uses DigitialOcean, where the starter database server (~$15/month) is plenty for Simmate usage. You'll only need >10GB if you are running >100,000 structure relaxations or frequently using unitcells with >1000 atoms.
(i) create an account¶
To start, make an account on DigitalOcean using this link (which uses our refferal). We recommend using your Github account to sign in. This referral link does two things:
- DigitialOcean gives you $100 credit for servers (for 60 days)
- DigitialOcean gives the Simmate team $10 credit, which will help fund our servers
If you have any issues, please make sure that DigitalOcean is still actually offering this deal here. Simmate is not affiliated with DigitalOcean.
(ii) create the cloud database¶
- On our DigitalOcean dashboard, click the green "Create" button in the top right and then select "Database". It should bring you to this page.
- For "database engine", select the newest version of PostgreSQL (currently v14)
- The remainder of the page's options can be left at their default values.
- Select Create a Database Cluster when you're ready.
- For the new homepage on your cluster, there is a "Get Started" button. We will go through this dialog in the next section.
Note, this is the database cluster, which can host multiple databases on it (each with all their own tables).
(iii) connect to the database¶
Before we set up our database on this cluster, we are are first going to try connecting the default database on it (named
- On your new database's page, you'll see a "Getting Started" dialog -- select it!
- For "Restrict inbound connections", this is completely optional and beginneers should skip this for now. We skip this because if you'll be running calculations on some supercomputer/cluster, then you'll need to add ALL of the associated IP addresses in order for connections to work properly. That's a lot of IP addresses to grab and configure properly -- so we leave this to advanced users.
- "Connection details" is what we need to give to Simmate/Django. Let's copy this information. As an example, here is what the details look like on DigitalOcean:
username = doadmin password = asd87a9sd867fasd host = db-postgresql-nyc3-49797-do-user-8843535-0.b.db.ondigitalocean.com port = 25060 database = defaultdb sslmode = require
In your simmate python environment, make sure you have the Postgres engine installed. The package is
psycopg2, which let's Django talk with Postgres. To install this, run the command:
conda install -n my_env -c conda-forge psycopg2
We need to pass this information to Simmate (which connects using Django). To do this, add a file named
my_env-database.yaml(using your conda env name) to your simmate config directory (
~/simmate) with the following content -- be sure substute in your connection information and note that ENGINE tells Django we are using Postgres:
default: ENGINE: django.db.backends.postgresql HOST: db-postgresql-nyc3-49797-do-user-8843535-0.b.db.ondigitalocean.com NAME: defaultdb USER: doadmin PASSWORD: asd87a9sd867fasd PORT: 25060 OPTIONS: sslmode: require
- Make sure you can connect to this database on your local computer by running the following in Spyder:
from simmate.configuration.django.settings import DATABASES print(DATABASES) # this should give your connect info!
(iv) make a separate the database for testing (on the same server)¶
Just like how we don't use the
(base) environment in Anaconda, we don't want to use the default database
defaultdb on our cluster. Here will make a new database -- one that we can delete if we'd like to restart.
- On DigitalOcean with your Database Cluster page, select the "Users&Databases" tab.
- Create a new database using the "Add new database" button and name this
simmate-database-00. We name it this way because you may want to make new/separate databases and numbering is a quick way to keep track of these.
- In your connection settings (from the section above), switch the NAME from defaultdb to
simmate-database-00. You will change this in your
(v) create a connection pool¶
When we have a bunch of calculations running at once, we need to make sure our database can handle all of these connections. Therefore, we make a connection pool which allows for thousands of connections. This "pool" works like a waitlist where the database handles each connection request in order.
- Select the "Connection Pools" tab and then "Create a Connection Pool"
- Name your pool
simmate-database-00for the database
- Select "Transaction" for our mode (the default) and set our pool size to 10 (or modify this value as you wish)
- Create the pool when you're ready!
- You'll have to update your
my_env-database.yamlfile to these connection settings. At this point your file will look similar to this (note, our NAME and PORT values have changed):
default: ENGINE: django.db.backends.postgresql HOST: db-postgresql-nyc3-49797-do-user-8843535-0.b.db.ondigitalocean.com NAME: simmate-database-00-pool # THIS LINE WAS UPDATED USER: doadmin PASSWORD: asd87a9sd867fasd PORT: 25061 OPTIONS: sslmode: require
(vi) build our database tables¶
Now that we set up and connected to our database, we can now make our Simmate database tables and start filling them with data! We do this the same way we did without a cloud database:
- In your terminal, make sure you have you Simmate enviornment activated
- Run the following command:
simmate database reset
- You're now ready to start using Simmate with your new database!
(vii) load third-party data¶
This step is optional.
With Sqlite, we were able to download a prebuilt database with data from third-parties already in it. However, creating our postgres database means our database is entirely empty.
To load ALL third-party data (~5GB total), you can use the following command. We can also use Dask to run this in parallel and speed things up. Depending on your internet connection and CPU speed, this can take up to 24hrs.
simmate database load-remote-archives --parallel
--parallel will use all cores on your CPU. Keep this in mind if you are
running other programs/calculations on your computer already.
(viii) sharing the database¶
If you want to share this database with others, you simply need to have them copy your config file:
my_env-database.yaml. They won't need to run
simmate database reset because you did it for them.