Sep 12, 2024

SQL from a Programming Language Perspective — Part II

SQL from a Programming Language Perspective — Part II

SQL from a Programming Language Perspective — Part II

SQL from a Programming Language Perspective — Part II

SQL from a Programming Language Perspective — Part II

SQL from a Programming Language Perspective — Part II

We are back with the second part of our analysis of SQL form a programming language perspective (Part I can be found here).

SQL as a Declarative Language

As the primary purpose of this blog post is to show the correspondence between SQL and programming languages, let’s try to build a straightforward equivalent functionality in another language (we have examples in Go and C++).

First, SQL has another programming language property: it is classified as a declarative language. This means you specify what you want to achieve without detailing how it should be done. The SQL engine will determine the necessary operations to return the expected results.

As we said and showed above, the table is just a collection of tuples (rows), and we defined that as:

type table []tuple

Here is more: https://github.com/pdelewski/querydsl/blob/main/v1/v1.go#L8

As a first step, let’s put some data into our table (the equivalent of SQL INSERT INTO command):

table1 := table{
   {1, "a"},
   {2, "b"},
   {3, "c"},
}

Here is more: https://github.com/pdelewski/querydsl/blob/main/v1/v1.go#L10

Next, we define three primitive operations:

  • from 

  • where

  • project

FROM takes a table slice and returns a new table. This implementation is simplified, as in reality, it should return a new table that is a cartesian product of all arguments.

func from(t ...table) table {
   result := make(table, 0)
   for _, tt := range t {
      result = append(result, tt...)
   }
   return result
}

Here is more: https://github.com/pdelewski/querydsl/blob/main/v1/v1.go#L10

WHERE takes predicate as a parameter giving a way to express filtering behavior:

func (t table) where(predicate func(tuple) bool) table {
   result := make(table, 0)
   for _, tuple := range t {
      if predicate(tuple) {
         result = append(result, tuple)
      }
   }
   return result
}

Here is more: https://github.com/pdelewski/querydsl/blob/main/v1/v1.go#L18

Last but not least, we need the projection operation (we name it project instead of select, as select is a reserved Golang keyword).

func (t table) project(selector func(tuple) tuple) table {
   result := make(table, 0)
   for _, tuple := range t {
      result = append(result, selector(tuple))
   }
   return result
}

Here is more: https://github.com/pdelewski/querydsl/blob/main/v1/v1.go#L28

With defined primitives, we can express the above SQL query:

t1 := from(table1)
t2 := t1.where(func(t tuple) bool {
   return t.field1 > 1
})
t3 := t2.project(func(t tuple) tuple {
   return tuple{t.field1, t.field2}
})

Here is more: https://github.com/pdelewski/querydsl/blob/main/v1/v1.go#L42

The presented implementation is naive and works only with specific statically defined tuple types. Ideally, we want an implementation that can handle arbitrary tuple types. We can achieve this by using generics to express the necessary semantics. Since Go function receivers cannot have generic types, we need to adjust the function signatures slightly.

Our new signatures will look like this:

func where[Tuple any](table []Tuple, predicate 
func(Tuple) bool) []Tuple
func project[Tuple any, ResultTuple any](table []Tuple, mapper func(Tuple) ResultTuple) 
[]ResultTuple

Now, these functions will work with any table type.

Here is more: https://github.com/pdelewski/querydsl/blob/main/v2/v2.go#L33

However, we still lack a robust implementation that can perform Cartesian products of all input tables. Implementing this in Go using generics is not feasible, as Go's generics are not sophisticated enough for such tasks. Nonetheless, this functionality can be achieved at compile time in other programming languages with advanced metaprogramming features, such as C++, D, Zig, or Nim.

Fortunately, the above implementations are only one way to represent tables and tuples. In a real database, these concepts are defined more dynamically.

Then tuple and table can be:

type tuple map[string]interface{}
type table []map[string]interface{}

With such representation, we can rewrite from and other primitives 

Here is more: https://github.com/pdelewski/querydsl/blob/main/v3/v3.go#L10

Have you seen something similar already? Probably yes, as we already created a subset of LINQ. LINQ is an embedded DSL (domain-specific language) invented by Microsoft. It provides a consistent query syntax to retrieve and manipulate data from various data sources, such as collections (like arrays, lists), databases, XML documents, and more. You can find a more complete Golang implementation here: https://github.com/ahmetb/go-linq.  

SQL Syntax Challenges

As mentioned earlier, SQL has some syntax problems. Let’s examine them:

One issue is that SQL can be hard to read. In our earlier example, we built a simple SQL query highlighting these challenges. We had to start defining our query from the middle with the FROM clause, then add the WHERE clause at the end, and finally jump to the beginning to add the SELECT clause. This demonstrates that the syntax order in SQL differs from the actual execution order, which follows a semantic sequence:

  1. FROM: The FROM clause identifies the tables or views from which to retrieve data. If there are JOIN operations, they are also processed at this step.

  2. WHERE: The WHERE clause filters rows based on specified conditions. Only rows that meet these conditions proceed to the next step.

  3. GROUP BY: The GROUP BY clause groups the selected rows into subsets based on the values of one or more columns.

  4. HAVING: The HAVING clause filters the groups created by GROUP BY based on a condition. Only groups that satisfy this condition proceed.

  5. SELECT: The SELECT clause determines which columns or expressions will be returned in the result set. This is also where column aliases are applied.

  6. DISTINCT: The DISTINCT keyword, if used, removes duplicate rows from the result set.

  7. ORDER BY: The ORDER BY clause sorts the result set based on one or more columns.

  8. LIMIT/OFFSET: The LIMIT and OFFSET clauses restrict the number of rows the query returns and allow skipping a specified number of rows.

Another issue is duplicated clauses that do the same thing, such as WHERE and HAVING. Both express logical expressions, but they are used in different contexts.  

SELECT Department, SUM(Salary) AS TotalSalary FROM Employees WHERE Salary > 55000 GROUP BY Department HAVING SUM(Salary) > 110000;

SQL's recursive nature allows inner SELECT statements to be used within other SELECT, WHERE, or FROM clauses, further complicating readability. While the WITH clause (Common Table Expressions, or CTEs) can help alleviate some of these readability issues, it comes with its own set of challenges.

Example of a recursive query containing inner select statements inside select, from, and where:

SELECT
   (SELECT COUNT(*)
    FROM users
    WHERE department_id = departments.id
   ) AS department_user_count,
   department_name,
   (SELECT MAX(salary)
    FROM employees
    WHERE employees.department_id = departments.id
   ) AS max_salary
FROM
   (SELECT id, department_name
    FROM departments
    WHERE EXISTS (
                  SELECT 1
                  FROM employees
                  WHERE employees.department_id = departments.id
          )
       ) AS departments
WHERE
   department_user_count > 10
ORDER BY
   max_salary DESC;

How Quesma helps with SQL

Quesma can perform transformations between different representations as a proxy or intermediate layer. For example, it could potentially convert PRQL or PIPE SQL into SQL, or vice versa. This allows for simpler input representations that can be transformed internally into various SQL dialects. Conversely, you could start with SQL and transform it into something simpler but semantically identical.

Another potential feature is real-time performance optimization. Quesma could rewrite SQL queries dynamically, making optimizations along the way.

The key idea is that Quesma gives the illusion of directly connecting the application to the original data store. In reality, many transformations and optimizations can occur in the background.

If you're interested in learning more, visit our GitHub page or reach out.

Conclusion

SQL is fundamentally about processing tables, a task we also handle in other programming languages. While SQL is powerful, its syntax presents certain challenges. Understanding these can help bridge the gap between SQL and traditional programming approaches. The community is aware of SQL's issues and tries to resolve them using different methods. Some are starting from scratch, like the PRQL language, while others follow more evolutionary approaches, such as the introduction of Pipe syntax by Google Research.

Table of Contents

Title
line
Title
line

Table of Content

Title
line