Aug 19, 2024
by Jacek Migdal
When Pancakes saved our SQL performance
During our pilot deployment with a design partner, we discovered Quesma was too eager to churn out SQL queries. The problem came to a head with dashboards, especially one spanning a long time range with 15 panels. The worst offender among them? A single panel that issued 10 SQL queries. Yes, 10.
And these weren’t your run-of-the-mill SQLs. No, many had expensive WHERE clauses and joins that made our ClickHouse server sweat like it was running a marathon in August. The result? Performance that was, let’s say, less than stellar.
We tried patching the issue with some one-off optimizations. Sure, they helped a little, but it felt like using duct tape on a sinking ship. Plus, the code was getting messier than a toddler’s pancake breakfast.
But then, we had a breakthrough. We figured out how to combine all those disparate SQLs—ElasticSearch/OpenSearch Query DSLs—into one uberSQL. Along with some other tweaks, we managed to boost performance by 50x. Meet the Pancake SQL Pattern.
The problem: too many SQLs, too little efficiency
There are several types of panels in the dashboard, such as [0]:
Show the top 10 microservices by the number of requests.
Also, include a sum of service requests that don’t make it into the top 10.
Also, include the count of how much data we scanned.
Plot those top 10 plus others over the last month grouped by day.
RED metrics (rate, number of metrics and p99 duration) over each day.
Moreover, the total metrics of RED for the whole period are as follows.
Show the top 200 microservices by the number of requests. For each, show the top 5 modules by error count. For each module, show the top 3 error categories.
The dashboard is used for drill-downs, meaning more filters (customer ID, team version) and longer time ranges. The bottleneck? Reading from columnar storage. We could’ve just thrown more cloud dollars at the problem, but that would go against our value prop—being 10x more efficient.
We were simply reading too much raw data too many times.
The inspiration: Pancakes and SQL
The idea came from a simple question: how can you find the top 10 services and get the total count of the rest in one shot? Initially, we were running two queries:
Running those queries has a significant drawback - the database has to read the entire table twice. We can avoid that by using SQL window functions, which allow us to collapse these into a single query[1]:
Window functions are less known and sometimes look atypical, but they’ve been around for decades (since SQL:2003) and are supported by almost every major analytical engine.
Since filtering and grouping caused the vast majority of the database load, combining two SQL queries halved the data we needed to scan. This translated into a solid performance gain, especially when ClickHouse ran at full tilt.
Pancakes: the breakfast of SQL champions
When we saw the power of window functions, we immediately wanted to use them in Quesma. However, translating arbitrary aggregations to SQL with window functions is no easy job.
Another lightbulb moment came from a personal experience during paternity leave. I was the primary caregiver, and feeding my son solid food was a challenge. Pancakes, however, were a hit. They were simple and versatile, and he loved them. This got me thinking—maybe pancakes could solve our SQL struggles, too.
In the Pancake SQL Pattern, we introduced new phases to our Query DSL to SQL transformation:
Query DSL
Parsed tree of aggregations
Pancake(s) of aggregations
Generated SQL AST (Abstract Syntax Tree)
A “pancake” is a new structure we defined where aggregations are stacked. Each layer is a grouping with a limit, and between layers, we have metric aggregations—our pancake “fillings.”
Here’s what a pancake might look like:
Metrics fillings: Total sum of requests
Grouping layer: Top 10 services by requests
Metric fillings: Sum of requests and errors for each top 10 service
Grouping layer: By day
Metric fillings: Sum of requests and errors per service per day
Each query is against some table and has additional top filters e.g. env = production.
This all happens in one query using window functions and ranking functions:
As long as groupings are stacked, we can translate all metric functions and groupings into one SQL. QUALIFY is like HAVING or WHERE but for window function. If it’s not available, another CTE can solve this.
All Query DSLs in our data set could be translated into a single pancake. Those that did not fit could still be translated into a few pancakes, reducing the overall number of SQL.
Stacking groupings allows us to consolidate all metric functions and groupings into one SQL. In our dataset, all Query DSLs can be combined into one. Though in some synthetical Query DSLs that don’t fit into a single pancake can usually be squeezed into handfuls, drastically reducing the number of SQL queries
The upside of pancakes
The Pancake SQL Pattern gave us these benefits:
Single Raw Data Query: We once do the most expensive operation (initial full scan).
Consolidation: Many aggregations fit into one pancake.
Efficiency: ClickHouse’s performance became predictable and close to the theoretical maximum throughput.
Simplicity: We used standard SQL, avoiding complex, DB-specific tricks.
Some potential downsides, like data duplication in result sets, didn’t significantly affect performance.
Alternatives we considered (and discarded)
We tried grouping the same queries other than select statements, but it wasn’t enough.
Other ways of grouping queries have also proven less universal at the expense of more complex code.
We tried using temporary memory tables. They worked performance-wise but were much more problematic to manage reliably and had more limitations in this use case.
Materialized views are great for universally speeding up specialized queries, but it is easy to ask queries that can no longer be leveraged, and we are back to baseline performance. We want baseline performance to be outstanding, too.
Wrapping Up: Why Pancakes Matter
As a database gateway, Quesma offers unique ways to optimize performance and make data more accessible. The Pancake SQL Pattern and other optimizations improved performance 50 times.
These seamless improvements are critical for DevOps and security engineers. When troubleshooting in production, every second counts. For many customers, an hour of downtime means over $100k in lost revenue. Tools that perform under pressure aren’t just nice to have—they’re essential.
At Quesma, we’re committed to ensuring that your database won't disappoint when you need your tools the most.
Footnotes:
[0] - Actual requests were anonymized. We translated them to industry-wide equivalents to preserve the design partner's anonymity and make this article easier for the general public to read.
[1] - We used ClickHouse SQL dialect, where window SQL runs after regular selects. Some SQL dialects look different.