Michael McGreal

Michael McGreal

(type-safe) SQL wins

Status: Needs formatting

When I first saw SQL, it looked like a relic of the internet waiting to be usurped.

Even inside of a knex.js raw() function, SQL was painfully plain text. No syntax highlighting. No type checking. And screaming at me in ALL CAPITALS.

I tried everything I could to avoid it.

TL;DR For *real-world queries*, nothing beats the flexibility and ecosystem of SQL.
Especially when paired with fast edge runtime lambdas and type-safe SQL tools such as drizzle-orm.

Prisma felt great at first. Defining a schema using a higher level schema definition language. Then letting Prisma write the ugly DDL ALTER TABLE type statements.

But after using prisma in a serverless environment, the negatives became clear:

  • The regular prisma client couldn't run in edge environments. And their edge-compatible client was just routing lambdas through an intermediary node server where the prisma engine could process the query – i.e. another stop between the user and the database.
  • Prisma queries often made multiple round trips for a single query.
  • The query language was not expressive enough for some complicated queries

Another feature I did like about prisma was its Db push solution, especially paired with PlanetScale's "zero downtime" migrations feature. PlanetScale's unique horizontal scalability via sharding was also comforting – especially in a serverless environment where database connections could be quickly exhausted. (Another honorable mention of Prisma is their suberb data guide.)

But Prisma had to go. I did research, saw some interesting solutions, including kyseley, phero, drizzle, and EdgeDb. I chose EdgeDb because:

  • It seemed like the most well funded and maintained; with familiar names like Vercel CEO as investor, and Colin hacks on the typescript client team (though he left)
  • The founding team "magic stack" are brilliant devs, and developed custom core packages for the edgedb server such as "uvloop", which in part allowed connection pooling to the scale of "10,000+" simultaneous connections.
  • Built on top of Postgres, so it could perform complex upsert queries in a single round trip (MySQL does not have an equivalent "returning" feature like PostgreSQL, and can only upsert based on a primaryKey conflict).
  • Had a lightweight http (fetch) client for edge runtimes!
  • Had a beautiful schema defintion language that looked similar to typescript. To link an object was as simple as using the link keyword (no foreign key references required)
  • A migration CLI tool to handle conflicts, and watch for schema changes

I spent months devouring their documentation, learning the quirks, making a public playgrounds that handled all the complex initialization (with branches showing solutions to common problems), and using the tool for real-world needs.

The initial challenges of EdgeDB became obvious in the first day. The way EdgeDB achieves efficient bulk queries is by using their e.params() and e.for() query builder functions (Like other query builders, theirs is just a "fancy way to concatenate strings"). These query builder functions are powerful, but introduce some obstacles. I was used to prisma, which allowed using javascript variables defined outside the query builder scope, inside of the query. This too is possible with the edgedb query builder, but not for bulk queries. For example, looping over a javascript object in a for..of loop, and executing an edgedb upsert query for each loop will do just that – execute many queries. One of the primary marketing messages of EdgeDb is that it can perform complex bulk queries in a single statement. And it really can. But it requires passing your javascript data to edgedb via e.params(), and then e.for() looping over them. Inside the query, every operation must be handled using edgedb's proprietary query language. Using javascript variables or even javascript operators (such as the coalescing ?? operator) will fail – because the query builder does not have access to the actual value; rather it is just building a EdgeQL query string. So to implement the coalescing operator for example, you must use EdgeQLs implementation of it, which in the typescript query builder is done via their e.op() function. Specifically e.op($.param1, '??', e.str('fallback')). Even the fallback string must be wrapped in an e.str() function or it will fail. Luckily, the query language is robust, and has alternatives for most any javascript or sql function. But, for real-world queries with complicated filter logic, and data full of nil values these query builder functions quickly became unwiedly and difficult to maintain. This is probably my biggest complaint. Real world data is full of optional keys and nil values, and handling them inside e.params() and e.for() is challenging. Also, edgedb can not generate e.params based on typescript types. So even if you have the types for an API reponse, they cannot be used. Your options are to: 1. Pass the json data to e.params() as type e.json(). Which then must be unpacked inside the e.for() scope using e.json_array_unpack() or e.json_object_unpack(). And since edgeql does not know the type of unpacked data, you must also cast each unpacked values using e.cast(e.json, e.int64($.param2)). Or 2. Type the params inline. This is much easier because then the values do not have to be e.cast(). It also supports e.optional() – though currently only as a top-level param. So be prepared to flatten your inputs before passing them to the query builder. One annoyance of this method though is that the query builder expects an object exactly as you describe, nothing more. So if you get a response from an API, you must first omit the values which e.params() is not expecting, else you'll receive a runtime error. Libraries like Remeda help picking from objects, but it can be cumbersome for large nested objects, typical of API responses.

This error also reveals another major benefit to using e.params() though – runtime validation without a library like zod! A community tool that generates e.params() from either typescript types or json objects would make this immensely more usable. Similar to QuickType, or any zod generator library.

So far in this article, I have countered each positive feature with a negative. And that pattern will continue.

After learning the (many) quirks of this query builder, I find its use of javascript closures and the composability of queries 🤯. Look at the how many levels of upserting this single query handles, with javascript composability inside closures.

1e.params(
2 { items: e.optional(e.array(e.tuple({ name: e.str, age: e.int16 }))) },
3 ($) =>
4 e.for(e.array_unpack($.items), (item) =>
5 e
6 .insert(e.Parent, {
7 name: item.name,
8 age: item.age,
9 })
10 .unlessConflict((item) => ({
11 on: item.name,
12 else: e.update(e.Parent, (p) => {
13 // this is a javascript closure! So you can modify data before returning the object.
14 // but only using values from within the scope of e.params
15 const children = e.select(e.Child, (c) => ({
16 filter_single: e.op(c.parentId, '=', p.id),
17 }))
18 return {
19 set: {
20 chilren: e.insert(e.Child, {
21 name: children.name,
22 }),
23 },
24 }
25 }),
26 }))
27 )
28)

However, is it worth the complexity? Because this query doesn't even show the ergonomic issues of handling optional values or using complicated nested filter logic – which in my real-world cases have easily expanded a query into 200+ lines.

At least for my current needs, the only time I perform complicated bulk queries is in non user-facing applications, which do not need to respond instantly. Thus, to avoid handling nil values inside the query builder, I end up using single queries inside for..of loops, voiding the speed benefits of e.for() bulk queries.

For my current user-facing code usually only requires simple CRUD type queries such as "update this single object". And, assuming the server is close to the database, the slowest part of this user request is the time to initiate a NodeJS Lambda. So by moving these simple CRUD queries to execute in an edge runtime, the "cold boot" time is decreased to milliseconds. From there, a few round trips to the database is actually not a problem.

So at this point, some of my initial assumptions about the core value of edgedb's "complex queries in a single round-trip" began to unravel.

Note: I have not mentioned some of the benefits/challenges around the migration tool. For example, when a column type changes, instead of truncating data, the migration tools asks you how to handle the transfer. But you must handle the transformation using EdgeQL, not javascript. The code I've shown so far is using the typescript query builder, not direct EdgeQL, so to use this tool well, you must learn 2 syntaxes; or avoid the query builder and its type safety benefits entirely, and write all your queries in EdgeQL directly. However, this conflicts with a primary value of using EdgeDb, for its typesafe query builder 🤷‍♂️. Another marketed benefit of EdgeQL is that it is language agnostic, so in the relatively rare cases that you're managing a multi-language application, or change your application language, your queries will still work. But also raw SQL is application language agnostic too?!

But at least I could also use edgedb in an edge runtime environment right? I had been using their httpClient in edge runtimes successfully in a localhost dev server for weeks. However, when I deployed to production, it always errored. This simple error led me deep into their httpClient source files, and even deeper into the actual edgedb server, which I manually deployed on multiple clouds (google, and digital ocean). The first major issue was around HTTPS. The reason the edgedb httpClient queries were working in localhost but not when deployed to my https production domain is because localhost is http and so was the edgedb server's IP address – thus none "mixed content". To solve it, I needed to route requests to my edgedb server through HTTPS. This led me deep into the (mostly undocumented) edgedb linux server's file system, where I performed a complicated process to aquire a LetsEncrypt certificate authorizing one of my production domains. Then, inside google cloud DNS, I created a new A record for a subdomain edgedb.myDomain.com that forwarded traffic to the edgedb server's http IP address. Even after aquiring the certificate, getting the server to use it was challenging (none of the server flags or env vars were recognizing it). I eventually had to overwrite the servers default .pem files with the LetsEncrypt certificates (which expire in some months). Phew. All my edgedb typescript clients (node and edge) were now communicating with the server over a secure connection! The final test – would it work in egde runtimes? After many (many) tests (I really wanted it to work), it still did not. It did progress beyond the original error, but the deeper error was something out of my control. And the http client obfuscated error messages from the edgedb server, so it was hard to debug. And the EdgeDb team was mostly focused on other intiatives. (Though days later another user submitted a PR to improve error messages in the httpClient)

At this point, I had to make a difficult (but easy) decision. I needed edge-runtime support. I'm sure EdgeDb will resolve this specific error, maybe even soon. But I need it now. Do I abandon this new query language to which I had invested months of intense learning? Many nights away from my family eating dinner at my desk. I had aquired domain expertise in this tool: from its query language, to its migration CLI tool, even to its internal server configurations (which is a bit ironic since tools like this are supossed to abstract away server problems, and allow devs to focus on application level code - however, I am a much better developer for it now). I had experienced this feeling before. Actually, I have spent most of my career mastering tools, only to abandon them. In a previous career, I was a professional Adobe After Effects and Premiere Pro motion graphics and video editor. But as always, I know the answer is yes, I must abandon it. I've done this now with Firebase (NoSQL), Prisma + Planetscale, EdgeQL (PostgreSQL), and now returning to mySQL. It's been a long journey where I now have tried to solve the exact same problem in 4 different paradigms. I feel it was necessary for me to understand the nuances and tradeoffs of each. The parts of SQL that I previously scoffed, I now embrace, because its benefits as a whole are enormous. SQL has been executing real-world complex queries for 50+ years. I would like to spend not another minute wrangling new-age tools to do the same.

Returning to SQL

By this time, some new tools were arising that offered some type safety to SQL (more than knex.js). Such as kysely. But kysely did not have a schema design tool, which would be hard to give up after using them in Prisma and EdgeDB. Another tool I had been following in the background, and that was gaining popularity was drizzle.

Drizzle offered a different take on schema definition files. Its schema file was a simple .ts file! That's right, the schema is defined directly in typescript using familiar export and const keywords! This also means that there is none between step to run generate typescript files based on the schema. I.e. 'npm run generate'. If you edit the typescript file, the rest of your repository instantly accesses the new type! Wow. This does however increase the typescript demands on your machine, but if you get to the point of it being painful, you can likely afford a new machine.

The new drizzle push featur offered a familiar workflow that I enjoyed using prisma. And when paired with my old friend PlanetScale, it allowed "zero downtime" migrations, which I now cannot imagine leaving either. In this workflow, the drizzle-cli tool to manage migrations is not needed. My entire database workflow has been condensed to: 1. write schema in typescript. My files instantly get new types. 2. When I want to commit these changes to the database, yarn push, which pushes the database changes to the PlanetScale MySql server. If simply renaming files, or adding new columns, the drizzle push command does not even ask you to handle cases, because there are none. It doesn't yet have features to handle cases like EdgeDb's did, but it's still young and the open source ukraine based is know to the community as a fast moving opne-source team. Also, with PlanetScales "data branching" workflow, the data in your dev branch does not affect data in your production branch.

However, these PlanetScale features do come at a slight cost.

  • MySQL already does not offer a "returning" feature like postgreSql, which allows more sophisticated upsert queries. The best MySQL has is a "ON DUPLICATE KEY UPDATE" statement, which only works on conflicting primary keys (not conflicts based on other non primary columns)
  • PlanetScale offers practically infinite horizontal scaling by sharding the database (using Vitess). However, to achieve this massive benefit, they must remove "foreign keys" from MySQL. I.e. rows are not linked to other rows. The concept of linking rows still remains, it's just not enforced on the database policy level. For Example, a row in table A, can still have a column named "columnBId" that references another table. PlanetScale still encourages you to model relations like this, but at the application level instead. And actually, this is nice. In EdgeDb, it's easy to insert and query linked objects, but good luck deleting them. Policy errors and "target policies" become complicated with multiple levels of dependencies as the schema grows. Since freeing from link policies, I've found unexpected benefits from managing them at the application level. In many cases when working with real world data, the incoming data is not always in the nested structure my database desires. Meaning my single elegant EdgeDb query was suddenly incompatible. Restructruing the EdgeDb query would be an all day task, so I instead opted to restructure the incoming data before passing it to EdgeDb. But in PlanetScale, without foreign key constraints, I insert objects in any order, regardless of parent child relationships. One does not have to exist for the other to be inserted. It's a little more to manage to ensure the related row's id is added, but overall it is much less effort than in edgedb. Deleting objects is easy without link policies. Sure this can lead to "orphan rows", but orphan rows are relative inexpensive for the price of flexibility, and can be easily identified and deleted. Also, instead of the user waiting for delete operations to completes, orphan rows can be bulk deleting in slow hours, like overnight. In the modified words of PlanetScale "any sufficinetly advanced application will be manging relations in application logic anyway".
  • PlanetScale also does not offer with statements, so it's harder to compose queries like I was used to in PostgreSql.

But again, all of these all basically non-issues when compared to the major benefits PlanetScale provides. The speed of edge runtimes can afford multiple queries (especially when the database is close to the PlanetScale MySQL server). Sure, I can't do complicated multi-level upsert queries like I could in EdgeDb – but do I really need to? No. And actually, despite multiple queries, my code is cleaner than in edgedb, because I can use javascript patterns (such as IIFEs, or functional builder patterns). I have complete control over the data using all my familiar standardized javascript techniques too! Standardized SQL and standardized javascript. Wow, it feels good. These tools (planetScale, drizzle) only exist because developers have been studying its quirks and building around them for decades. Similar to javascript ecosystem moving back web standards, why create something new, when we can make existing standards easier?

Finally, to drizzle. As I write this, just last week drizzle released their "relations" api, which is a tools to manage "application level relations" without actually enforcing foreign key constaints on the database level! I.e. it works well with planetscale. So far, it is superb at returning nested data. I doesn't yet aid in performing update or deletes across related data, but that's no problem. It is definitely not as easy to define a link in the drizzle typescript schema file, as it was in EdgeDb's schema.esdl file – however, it's just typescript, and there is no magic box. The relation fields and what they reference are very visible. Even in many-to-many relations are implemtned with familiar SQL "junction" or "join" tables. When you add a new many-to-many relation, a new table appears in your database. No magic! Drizzle is just making standard SQL workflows easier and type-safe. Its delightful.

What do I miss from EdgeDb? Computeds. In EdgeDb computed could be defined in the database or ad-hoc on queries. This was great for returning aggregations for example. Does drizzle offer this? Kind of. Their "extras" columns on relation queries allow you to return custom fields that return a value computed by SQL. E.g. to concatenate a firstName and lastName value. However, at the time of writing, extras cannot access data from reltated "with" columns. I.e. you cannot return an aggregated value based on child rows. At least not with their relational API. (Technically you could achive this using drizzle's type-safe `sql``` operator, but you would have to re-select all related fields (double computation), so you'd be better to use the standard drizzle select API that does offer aggregations). However – once again thanks to the speed of edge runtimes – even the (relatively) non-elegant alternatives of either writing 2 sequential queries or over-fetching and filtering on the server are still fast! And easier for other developers to understand and maintain. Once again, edge runtimes negate the need for a single query. And since SQL has been around forever, even tools like chatGPT can help with complicated SQL queries. And so far, in the cases where I have written drizzle-assisted raw sql, it has been much simpler than the edgedb equivalent.

A final benefit to adopting SQL standards is that all problems become easier. So far, I've mentioned the tools: PlanetScale, which solves horizontal scaling; and Drizzle, which solves type-safe SQL. But there are exponentially more tools designed to integrate with SQL and its ecosystem of tools, than other proprietary protocols. One example is the drizzle community is building a next-auth adapter, while no such adapter is even being requested for EdgeDb. A bigger example is that I now have access to Retool. Suddenly, the admin app I've been building by hand (mostly as an experiment into nextjs /app dir and React Server Components) can be made with a fraction of the time/effort in Retool. I'll still keep the admin app active for more complex tasks, but for 90% of admin tools, Retool gets the job done – and in a beautiful UI that is pleasing to admin app users. They free me from managing UI concerns around accessiility, state management, virtualization, etc... which are filled with nuances, even when using libraries like MUI. I'm happy to hand-off that problem to retool, so I can focus on higher-level problems. In contrast, EdgeDB can accept raw postgreSQL queries, but only for reads – not writes. Maybe someday they will also allow writes, but at this point, even if they did, I don't think I would return. The speed of edge runtimes change the equation. And even if a company does need complicated subqueries – then they likely can afford a SQL expert to write custom queries that would achieve at least 80% of an EdgeQL query (especially if using postgreSQL). Moreover, the SQL experts and documentation are abundant – even chatGPT produces quality SQL given quality input – but finding an EdgeQL expert will be more difficult and expensive.

As much as I tried to avoid SQL in the beginning, dare I say, I'm actually starting to enjoy it! Especially when paired with drizzle's type safety, things like and filters with coalesce operators and params are just easier in SQL. Its shouty ALL CAPITALS which I originally despised are starting to become endearing. Now, when I see those 3 little letters, I only hear the beautiful sounds of the brilliant dolphin squeaking in an ocean of easily malleable data. The "My" prefix of SQL becomes more true with every query. I have found my tool in an old language, paired with new tools.