Databases as a Dev Tool
Some years ago I had the view that databases were a persistence tool, that they hadn’t evolved in any significant way in the last decades and that they were a somewhat necessary evil. Today I use databases as a tool among others in my developer’s toolbox. I think there are a number of reasons why my views have changed.
- Knowledge
- Ownership
- Tooling
- Database features
As a .Net developer I seldom interacted directly with the
database. We used an
ORM and
migrations were scary. Running SQL queries
directly was done by selecting text in SQL Server Management
Studio and hitting
F5, preferably by a colleague, hoping we didn’t forget a where
statement.
As a SharePoint consultant I was explicitly trained not to touch the database, as that would void the support agreement and that the database should be considered a black box.
So a large part of my unwillingness to do database work was definitely caused by my lack of knowledge, paired with a culture promoting as many layers of abstraction as possible, i.e baklava code
The database was usually owned and operated by some other part of the organisation than the dev team I was part of. Creating a new database, a new user with appropriate permissions or getting access to a backup involved lead time.
For the rare occasion I would update a stored procedure, I would be handed a high privileged account with a password set 8 years ago. I would do my update and hopefully forget the account credentials. Version control was nowhere in sight.
The database server was always a somewhat aged version of Microsoft SQL Server. Poor JSON support, no arrays, no ranges and not even a boolean.
Having lived with the consequences of poor database design has also made me appreciate the value a sane database design can provide. Without constraints, bad data needs to be handled after reading it from the database, rather than at writing (creating, updating, deleting).
Now that I’ve learned CTE’s and have access to better tooling, I find databases more enjoyable to work with. Let me break it down.
Ownership and Tooling
We have a database server for our team, defined by a Terraform JSON file. To add a new database or an account (Postgres role), we add a name to an array and run a CLI command to apply the change to Dev and Prod respectively. Let’s say we want to add a database for our new “Experiment Y” proof of concept, we add two lines:
By owning our own database server we are also responsible for the costs associated and the routines for backup/restore (with help from team Tooling).
I can connect to the new database using short-lived credentials through a
./scripts/connect-db.sh -e [local|dev|prod]
script in the repo.
For Swedish speakers, there is a podcast episode Kodsnack 546 where the guys from our Tooling team expands on our infrastructure.
Our repos usually have a docker-compose.yml
that spinns up everything we
need. We have a script that seeds the local database from the production
database, anonymises personal identifiable information and dumps it to a local
file. With that file on disk, the local database is seeded on docker compose up
. The repo contains both the application code and all the database code.
Functions, procedures, views, triggers and policies are written with
create or replace
, so they can be changed in the same way as our TypeScript
functions or CSS rules, with a file watcher looking for changes in *.sql
files as well as other source files.
Tests
Another container defined by our docker-compose.yml
runs the tests on file
changes. Let’s look at a test to prevent supplements (like expenses or
allowances) from being placed outside a consultant’s employment (we’re in the
HR business, look at this from a time/expense reporting perspective).
The withHistory
part of this test is the arrange in “arrange, act,
assert” steps of a test. It creates a collective bargain agreement, a
consultant and an employment for the consultant. Only the employment period is
important for this particular test. We then call the command for creating a
supplement (expense or allowance). The command may be simple or complex, but
will result in a database insert in this case. We expect the error code
supplement_outside_employment
for this test. For the successful case we query
the database for the inserted expense.
Constraints between tables are implemented using triggers in this code base. An employment can have multiple employment periods if the conditions have changed during the employment, like an updated salary. Another test asserts that employment periods are contiguous.
Apparently there is no law that require us to hold shift while typing SQL ☝️ We also have better syntax highlighting in our editors than my blog can render.
Migrations
We write migrations by adding a .sql
file to the database folder which is
then run against the local database seeded with realistic data.
With a new feature developed, tested locally in the web interface and some level of automated tests, all code changes, including database code, can be pushed in one commit. The build server then builds the app’s Docker image, creates a temporary database and run all the tests in the repo in the same way as done on our local dev machines. If all tests pass, the Docker image is deployed to our Dev environment and the database migrations get applied to the Dev database server.
ORM
Since I’m talking about using database queries together with other code, we need some kind of transformation between the two. We’re using PostgreSQL and NodeJS with TypeScript and for that we’re using the postgres npm library to interact with Postgres. For an overview of different options, Beyond Fireship has a speedy walkthrough of ORM’s in the YouTube video I tried 8 different Postgres ORMs where the one we’re using is also mentioned. Combined with Zod, that can provide a nice mapping between SQL and TypeScript.
Again, we have better syntax highlighting in our editors than my blog can render, so it’s not just a plain SQL string to look at.
Database features
Having access to powerful features and sensible syntax makes it attractive to implement more logic in the database, especially when operating on larger datasets. Extracting large chunks of data into memory, performing the logic and then applying the updates back to the database is sometimes not feasible. This does couple the particular database engine tighter to the rest of the system, but the idea of simply swapping one database engine for another one is only relevant when viewing the database as pure persistence. Replacing Postgres with another database engine in our applications would be like replacing NodeJS with .NET, or Python with Java or whatever you might come up with – basically a rewrite.
By implementing constraints at the database level, you don’t have to deal with bad data quality. Using row-level security (RLS) decreases the risk of exposing too much data and ensures only authorised people make changes, like preventing your colleagues to see your salary and only letting your manager change it. This is also easy to test: just create two rows, one a user is supposed to see and one the user is not supposed to see. Make a select query as the user and see which rows were returned.
modern-sql.com contains a boat load of explanations of SQL features and Postgres’ Feature Matrix clearly shows that database development hasn’t stood still since I got out of university.