I recently stumbled upon a standard SQL feature that was implemented, to my surprise, in HSQLDB. The keyword is CORRESPONDING
, and it can be used with all set operations, including UNION
, INTERSECT
, and EXCEPT
.
Let’s look at the sakila database. It has 3 tables with people in it:
CREATE TABLE actor (
actor_id integer NOT NULL PRIMARY KEY,
first_name varchar(45) NOT NULL,
last_name varchar(45) NOT NULL,
last_update timestamp
);
CREATE TABLE customer (
customer_id integer NOT NULL PRIMARY KEY,
store_id smallint NOT NULL,
first_name varchar(45) NOT NULL,
last_name varchar(45) NOT NULL,
email varchar(50),
address_id smallint NOT NULL,
create_date date NOT NULL,
last_update timestamp,
active boolean
);
CREATE TABLE staff (
staff_id integer NOT NULL PRIMARY KEY,
first_name varchar(45) NOT NULL,
last_name varchar(45) NOT NULL,
address_id smallint NOT NULL,
email varchar(50),
store_id smallint NOT NULL,
active boolean NOT NULL,
username varchar(16) NOT NULL,
password varchar(40),
last_update timestamp,
picture blob
);
Similar, but not the same. What if we wanted to get all the “people” from our database? One way to do that in any ordinary database product is:
SELECT first_name, last_name
FROM actor
UNION ALL
SELECT first_name, last_name
FROM customer
UNION ALL
SELECT first_name, last_name
FROM staff
ORDER BY first_name, last_name
The result might look like this:
|first_name|last_name| |----------|---------| |AARON |SELBY | |ADAM |GOOCH | |ADAM |GRANT | |ADAM |HOPPER | |ADRIAN |CLARY | |AGNES |BISHOP | |AL |GARLAND | |ALAN |DREYFUSS | |... |... |
Using CORRESPONDING
Now, in HSQLDB, and in standard SQL, you can use CORRESPONDING
for this kind of task. For example:
SELECT *
FROM actor
UNION ALL CORRESPONDING
SELECT *
FROM customer
UNION ALL CORRESPONDING
SELECT *
FROM staff
ORDER BY first_name, last_name
The result is this:
|first_name|last_name|last_update | |----------|---------|-----------------------| |AARON |SELBY |2006-02-15 04:57:20.000| |ADAM |GOOCH |2006-02-15 04:57:20.000| |ADAM |GRANT |2006-02-15 04:34:33.000| |ADAM |HOPPER |2006-02-15 04:34:33.000| |ADRIAN |CLARY |2006-02-15 04:57:20.000| |AGNES |BISHOP |2006-02-15 04:57:20.000| |AL |GARLAND |2006-02-15 04:34:33.000| |ALAN |DREYFUSS |2006-02-15 04:34:33.000| |... |... |... |
So, what has happened? The columns FIRST_NAME
, LAST_NAME
, and LAST_UPDATE
are common to all three tables. In other words, if you run this query against the INFORMATION_SCHEMA
in HSQLDB:
SELECT column_name
FROM information_schema.columns
WHERE table_name = 'ACTOR'
INTERSECT
SELECT column_name
FROM information_schema.columns
WHERE table_name = 'CUSTOMER'
INTERSECT
SELECT column_name
FROM information_schema.columns
WHERE table_name = 'STAFF'
You get exactly these 3 columns:
|COLUMN_NAME| |-----------| |FIRST_NAME | |LAST_NAME | |LAST_UPDATE|
In other words, CORRESPONDING
creates the intersection of columns among the subqueries of a set operation (i.e. the “shared columns”), projects those, and applies the set operation that projection. In a way, this is similar to a NATURAL JOIN
, which also tries to find that intersection of columns to produce a join predicate. However, NATURAL JOIN
then projects all of the columns (or the union of the columns), not just the shared ones.
Using CORRESPONDING BY
Just like NATURAL JOIN
, this is a risky operation. As soon as one subquery changes its projection (e.g. because of a table column rename), the result of all such queries will change as well, and it might not even produce a syntax error, just a different result.
In fact, in the above example, we probably didn’t even care about that LAST_UPDATE
column. It was included in the UNION ALL
set operation by accident, just like NATURAL JOIN
would join using LAST_UPDATE
by accident.
With joins, we can use JOIN .. USING (first_name, last_name)
to at least specify by which shared column names we want to join the two tables. With CORRESPONDING
, we can supply the optional BY
clause for the same purpose:
SELECT *
FROM actor
UNION ALL CORRESPONDING BY (first_name, last_name)
SELECT *
FROM customer
UNION ALL CORRESPONDING BY (first_name, last_name)
SELECT *
FROM staff
ORDER BY first_name, last_name;
This now produces only the two desired columns:
|first_name|last_name| |----------|---------| |AARON |SELBY | |ADAM |GOOCH | |ADAM |GRANT | |ADAM |HOPPER | |ADRIAN |CLARY | |AGNES |BISHOP | |AL |GARLAND | |ALAN |DREYFUSS | |... |... |
In fact, this way, we could even use the syntax meaningfully for INTERSECT and EXCEPT, e.g. to find customers who share their names with an actor:
SELECT *
FROM actor
INTERSECT CORRESPONDING BY (first_name, last_name)
SELECT *
FROM customer
ORDER BY first_name, last_name;
Producing:
|first_name|last_name| |----------|---------| |JENNIFER |DAVIS |
Other dialects
I haven’t encountered this syntax many times in other dialects before. Perhaps, it will ship to PostgreSQL in the future. A branch has been worked on by Vik Fearing:
jOOQ might soon support it in the API / parser / translator:
https://github.com/jOOQ/jOOQ/issues/5285
from Java, SQL and jOOQ. https://ift.tt/3I559Ah
via IFTTT
No comments:
Post a Comment