About

Friday, January 14, 2022

A Rarely Seen, but Useful SQL Feature: CORRESPONDING

I recently stumbled upon a standard SQL feature that was implemented, to my surprise, in HSQLDB. The keyword is CORRESPONDING, and it can be used with all set operations, including UNION, INTERSECT, and EXCEPT.

Let’s look at the sakila database. It has 3 tables with people in it:

CREATE TABLE actor (
    actor_id integer NOT NULL PRIMARY KEY,
    first_name varchar(45) NOT NULL,
    last_name varchar(45) NOT NULL,
    last_update timestamp
);

CREATE TABLE customer (
    customer_id integer NOT NULL PRIMARY KEY,
    store_id smallint NOT NULL,
    first_name varchar(45) NOT NULL,
    last_name varchar(45) NOT NULL,
    email varchar(50),
    address_id smallint NOT NULL,
    create_date date NOT NULL,
    last_update timestamp,
    active boolean
);

CREATE TABLE staff (
    staff_id integer NOT NULL PRIMARY KEY,
    first_name varchar(45) NOT NULL,
    last_name varchar(45) NOT NULL,
    address_id smallint NOT NULL,
    email varchar(50),
    store_id smallint NOT NULL,
    active boolean NOT NULL,
    username varchar(16) NOT NULL,
    password varchar(40),
    last_update timestamp,
    picture blob
);

Similar, but not the same. What if we wanted to get all the “people” from our database? One way to do that in any ordinary database product is:

SELECT first_name, last_name
FROM actor
UNION ALL
SELECT first_name, last_name
FROM customer
UNION ALL
SELECT first_name, last_name
FROM staff
ORDER BY first_name, last_name

The result might look like this:

|first_name|last_name|
|----------|---------|
|AARON     |SELBY    |
|ADAM      |GOOCH    |
|ADAM      |GRANT    |
|ADAM      |HOPPER   |
|ADRIAN    |CLARY    |
|AGNES     |BISHOP   |
|AL        |GARLAND  |
|ALAN      |DREYFUSS |
|...       |...      |

Using CORRESPONDING

Now, in HSQLDB, and in standard SQL, you can use CORRESPONDING for this kind of task. For example:

SELECT *
FROM actor
UNION ALL CORRESPONDING
SELECT *
FROM customer
UNION ALL CORRESPONDING
SELECT *
FROM staff
ORDER BY first_name, last_name

The result is this:

|first_name|last_name|last_update            |
|----------|---------|-----------------------|
|AARON     |SELBY    |2006-02-15 04:57:20.000|
|ADAM      |GOOCH    |2006-02-15 04:57:20.000|
|ADAM      |GRANT    |2006-02-15 04:34:33.000|
|ADAM      |HOPPER   |2006-02-15 04:34:33.000|
|ADRIAN    |CLARY    |2006-02-15 04:57:20.000|
|AGNES     |BISHOP   |2006-02-15 04:57:20.000|
|AL        |GARLAND  |2006-02-15 04:34:33.000|
|ALAN      |DREYFUSS |2006-02-15 04:34:33.000|
|...       |...      |...                    |

So, what has happened? The columns FIRST_NAME, LAST_NAME, and LAST_UPDATE are common to all three tables. In other words, if you run this query against the INFORMATION_SCHEMA in HSQLDB:

SELECT column_name
FROM information_schema.columns
WHERE table_name = 'ACTOR'
INTERSECT
SELECT column_name
FROM information_schema.columns
WHERE table_name = 'CUSTOMER'
INTERSECT
SELECT column_name
FROM information_schema.columns
WHERE table_name = 'STAFF'

You get exactly these 3 columns:

|COLUMN_NAME|
|-----------|
|FIRST_NAME |
|LAST_NAME  |
|LAST_UPDATE|

In other words, CORRESPONDING creates the intersection of columns among the subqueries of a set operation (i.e. the “shared columns”), projects those, and applies the set operation that projection. In a way, this is similar to a NATURAL JOIN, which also tries to find that intersection of columns to produce a join predicate. However, NATURAL JOIN then projects all of the columns (or the union of the columns), not just the shared ones.

Using CORRESPONDING BY

Just like NATURAL JOIN, this is a risky operation. As soon as one subquery changes its projection (e.g. because of a table column rename), the result of all such queries will change as well, and it might not even produce a syntax error, just a different result.

In fact, in the above example, we probably didn’t even care about that LAST_UPDATE column. It was included in the UNION ALL set operation by accident, just like NATURAL JOIN would join using LAST_UPDATE by accident.

With joins, we can use JOIN .. USING (first_name, last_name) to at least specify by which shared column names we want to join the two tables. With CORRESPONDING, we can supply the optional BY clause for the same purpose:

SELECT *
FROM actor
UNION ALL CORRESPONDING BY (first_name, last_name)
SELECT *
FROM customer
UNION ALL CORRESPONDING BY (first_name, last_name)
SELECT *
FROM staff
ORDER BY first_name, last_name;

This now produces only the two desired columns:

|first_name|last_name|
|----------|---------|
|AARON     |SELBY    |
|ADAM      |GOOCH    |
|ADAM      |GRANT    |
|ADAM      |HOPPER   |
|ADRIAN    |CLARY    |
|AGNES     |BISHOP   |
|AL        |GARLAND  |
|ALAN      |DREYFUSS |
|...       |...      |

In fact, this way, we could even use the syntax meaningfully for INTERSECT and EXCEPT, e.g. to find customers who share their names with an actor:

SELECT *
FROM actor
INTERSECT CORRESPONDING BY (first_name, last_name)
SELECT *
FROM customer
ORDER BY first_name, last_name;

Producing:

|first_name|last_name|
|----------|---------|
|JENNIFER  |DAVIS    |

Other dialects

I haven’t encountered this syntax many times in other dialects before. Perhaps, it will ship to PostgreSQL in the future. A branch has been worked on by Vik Fearing:

jOOQ might soon support it in the API / parser / translator:

https://github.com/jOOQ/jOOQ/issues/5285



from Java, SQL and jOOQ. https://ift.tt/3I559Ah
via IFTTT

No comments:

Post a Comment