SQL from a Programming Language Perspective — Part I

by Przemek Delewski

This is a two part series that aims to present SQL as a programming language. We’ll explore its syntax, semantics, and type system and draw parallels between SQL concepts and those in other programming languages.

SQL as a Statically Typed Language

SQL is a statically typed language, meaning you must define a schema before performing operations. The primary SQL type is the table, which is essentially a collection of tuples. From a programming perspective, a table can be seen as an array of sets of fields.

To create a schema for an SQL table, we have to use the CREATE TABLE command (the example below uses ClickHouse syntax):

CREATE TABLE table (
                      Field1 Int32,
                      Field2 Float32,
                      Field3 String
) ENGINE = MergeTree()
     ORDER BY Field1;

This is just a schema definition. It defines fields with their types. To add data to the table, we need to use the INSERT INTO command:

INSERT INTO table (field1, field2, field3) VALUES
   (1, 2.2, 'Tom');

In statically typed programming languages like Go, we can create a similar structure:

type tuple struct {
   field1 int32
   Field2 float32
   field2 string
}

Here is more: https://github.com/pdelewski/querydsl/blob/main/v1/v1.go#L3

The above struct defines just one tuple, but in SQL, a table is a collection of tuples so the exact equivalent would be:

type table []tuple

Here is more: https://github.com/pdelewski/querydsl/blob/main/v1/v1.go#L8

Now, translate SQL INSERT INTO into golang slice append:

table1 = append(table1, tuple{4, "d"})

As visible, there is clear correspondence between schema definition and insertion in SQL and other programming languages.

SQL’s Table Processing Operations

Now that we've defined the central concept, we can describe SQL's purpose and semantics. Everything in SQL is about tuple processing. As defined above, a table in an imperative programming language is just a collection, an array of tuples.

What operations can we perform on collections of tuples?

A few that come to mind are:

Filter: Remove specific tuples based on criteria.

Project: Remove specific fields from a tuple and return a new version of the tuple with specified fields.

Group: Group tuples by a specific field.

Sort: Sort tuples by a specific field.

Merge: Combine multiple tables to create a larger set.

SQL supports all these operations using different keywords, such as SELECT, WHERE, GROUP BY, ORDER BY, JOIN, and FROM.

As mentioned earlier, everything in SQL is about table processing. It starts with a table and then performs some transformation on it.

Building an SQL Query

Let’s build a basic SQL query in three steps:

Specify the table: Use the FROM clause.

FROM table1

Filter tuples: Apply a WHERE clause to retain tuples where Field1 > 1.

FROM table1 WHERE field1 > 1

Select fields: SQL expects a list of fields we want to include in the result set. To express this, we use another SQL clause, the SELECT clause. The statement below is a complete SQL query that retrieves all fields from each tuple. It uses * as a wildcard, but you can also list all columns explicitly by name. Both statements are equivalent:

SELECT * FROM table1 WHERE field1 > 1
SELECT field1, field2, field3 FROM table1 WHERE field1 > 1

As you may have noticed, we built the above statement starting from the middle: first, we used the FROM clause, then added the WHERE clause, and finally, we jumped to the beginning of the statement to add the SELECT clause.

Why is this the case?

This relates to SQL syntax design and its evolution. SQL is an ancient language from a software engineering perspective. It was introduced 50 years ago and has evolved incrementally, meaning some features were added later. The HAVING clause, for instance, was introduced in SQL-92 and somewhat duplicates the functionality of the WHERE clause.

However, as the language evolves incrementally, it becomes challenging to refine it consistently without introducing breaking changes. As a result, many features were added over time without refining the original syntax, leading to increased overall complexity.

We will explore this problem in more detail next week (follow Quesma on LinkedIn to get notified when the second part of the series comes out).

Table of Content

Title