Sep 5, 2024
This is a two part series that aims to present SQL as a programming language. We’ll explore its syntax, semantics, and type system and draw parallels between SQL concepts and those in other programming languages.
SQL as a Statically Typed Language
SQL is a statically typed language, meaning you must define a schema before performing operations. The primary SQL type is the table, which is essentially a collection of tuples. From a programming perspective, a table can be seen as an array of sets of fields.
To create a schema for an SQL table, we have to use the CREATE TABLE
command (the example below uses ClickHouse syntax):
This is just a schema definition. It defines fields with their types. To add data to the table, we need to use the INSERT INTO
command:
In statically typed programming languages like Go, we can create a similar structure:
Here is more: https://github.com/pdelewski/querydsl/blob/main/v1/v1.go#L3
The above struct defines just one tuple, but in SQL, a table is a collection of tuples so the exact equivalent would be:
Here is more: https://github.com/pdelewski/querydsl/blob/main/v1/v1.go#L8
Now, translate SQL INSERT INTO
into golang slice append:
As visible, there is clear correspondence between schema definition and insertion in SQL and other programming languages.
SQL’s Table Processing Operations
Now that we've defined the central concept, we can describe SQL's purpose and semantics. Everything in SQL is about tuple processing. As defined above, a table in an imperative programming language is just a collection, an array of tuples.
What operations can we perform on collections of tuples?
A few that come to mind are:
Filter: Remove specific tuples based on criteria.
Project: Remove specific fields from a tuple and return a new version of the tuple with specified fields.
Group: Group tuples by a specific field.
Sort: Sort tuples by a specific field.
Merge: Combine multiple tables to create a larger set.
SQL supports all these operations using different keywords, such as SELECT, WHERE, GROUP BY, ORDER BY, JOIN,
and FROM
.
As mentioned earlier, everything in SQL is about table processing. It starts with a table and then performs some transformation on it.
Building an SQL Query
Let’s build a basic SQL query in three steps:
Specify the table: Use the
FROM
clause.
Filter tuples: Apply a
WHERE
clause to retain tuples where Field1 > 1.
Select fields: SQL expects a list of fields we want to include in the result set. To express this, we use another SQL clause, the
SELECT
clause. The statement below is a complete SQL query that retrieves all fields from each tuple. It uses * as a wildcard, but you can also list all columns explicitly by name. Both statements are equivalent:
As you may have noticed, we built the above statement starting from the middle: first, we used the FROM
clause, then added the WHERE
clause, and finally, we jumped to the beginning of the statement to add the SELECT
clause.
Why is this the case?
This relates to SQL syntax design and its evolution. SQL is an ancient language from a software engineering perspective. It was introduced 50 years ago and has evolved incrementally, meaning some features were added later. The HAVING
clause, for instance, was introduced in SQL-92 and somewhat duplicates the functionality of the WHERE
clause.
However, as the language evolves incrementally, it becomes challenging to refine it consistently without introducing breaking changes. As a result, many features were added over time without refining the original syntax, leading to increased overall complexity.
We will explore this problem in more detail next week (follow Quesma on LinkedIn to get notified when the second part of the series comes out).