About

Wednesday, November 18, 2020

Automatically Transform Oracle Style Implicit Joins to ANSI JOIN using jOOQ

While jOOQ is mostly being used as an internal SQL DSL for embedded, dynamic SQL in Java, where it offers the best solution on the market, jOOQ is increasingly also used for one of its secondary features: Its parser.

Having been introduced in jOOQ 3.9 primarly for the purpose of being able to parse DDL statements, e.g. to reverse engineer your schema for code generation purposes, we’ve added an increasing amount of features and SQL transformation capabilities that allow for the parser to be used as a standalone product, through the command line interface, our website, or through the ordinary jOOQ API.

One feature that has been added to jOOQ 3.14, which is mostly useful to those using jOOQ as a parser, is the capability of transforming old Oracle style implicit joins to ANSI JOIN.

Why avoid “implicit joins”?

The old Oracle style implicit join syntax is supported and properly optimised by most RDBMS vendors. In the past, prior to SQL-92, this is how we used to inner join tables, e.g. when querying the Sakila database:

SELECT *
FROM actor a, film_actor fa, film f
WHERE a.actor_id = fa.actor_id
AND fa.film_id = f.film_id

Granted, the syntax is kind of intuitive. Just declare all the tables you want to fetch data from, and then make sure the proper data is retained only by filtering for matching primary key / foreign key values.

Of course, this can go terribly wrong. For the many obvious reasons, e.g. when you forget a join predicate after adding a table. If the query is complex, this may be hard to debug. The solution is ANSI JOIN. Starting from SQL-92 (almost 30 years now!), this is how we join in most RDBMS:

SELECT *
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id

While it’s still possible to define wrong join predicates, at least it’s no longer possible to forget a predicate, because this is syntactically incorrect (except for MySQL, where, regrettably, the ON clause is optional):

SELECT *
FROM actor a
JOIN film_actor fa -- Syntax error
JOIN film f -- Syntax error

jOOQ’s implicit JOIN

Notice that it is common to refer to the above syntax as “implicit join”, while JPQL and jOOQ recycled the term for another kind of “implicit join”, which is foreign key path based, and even less error prone than the ANSI SQL syntax. With jOOQ, the above query can be written as follows:

ctx.select(
      FILM_ACTOR.actor().asterisk(), 
      FILM_ACTOR.asterisk(),
      FILM_ACTOR.film().asterisk())
   .from(FILM_ACTOR)
   .fetch();

The mere presence of these to-one relationship paths in the query will implicitly add the appropriate LEFT JOIN or INNER JOIN to the FROM clause. This is merely convenience on top of ordinary ANSI JOINs, not a replacement.

Transforming Oracle implicit joins

When you have an old code base that you wish to upgrade and transform all your queries to using ANSI JOIN, use jOOQ for that. You can use jOOQ’s programmatic capabilities (as mentioned before), or the free website https://www.jooq.org/translate.

On the website, just pick the “Oracle style to ANSI JOIN” option, place the following SQL on the left:

Input

SELECT
  a.first_name, 
  a.last_name, 
  count(c.category_id)
FROM
  actor a, 
  film_actor fa, 
  film f,
  film_category fc, 
  category c
WHERE a.actor_id = fa.actor_id
AND fa.film_id = f.film_id
AND fc.category_id = c.category_id
GROUP BY 
  a.actor_id,
  a.first_name,
  a.last_name

Output

SELECT
  a.first_name,
  a.last_name,
  count(c.category_id)
FROM actor a
  JOIN film_actor fa
    ON a.actor_id = fa.actor_id
  JOIN film f
    ON fa.film_id = f.film_id
  CROSS JOIN (
    film_category fc
      JOIN category c
        ON fc.category_id = c.category_id
  )
GROUP BY
  a.actor_id,
  a.first_name,
  a.last_name

And… whoopsies. The output correctly displays the resulting, undesired CROSS JOIN because one of the join predicates was missing:

Yep, the tool already helped! Let’s fix the input query:

Fixed input

SELECT
  a.first_name, 
  a.last_name, 
  count(c.category_id)
FROM
  actor a, 
  film_actor fa, 
  film f,
  film_category fc, 
  category c
WHERE a.actor_id = fa.actor_id
AND fa.film_id = f.film_id
AND f.film_id = fc.film_id -- This was missing
AND fc.category_id = c.category_id
GROUP BY 
  a.actor_id,
  a.first_name,
  a.last_name

Fixed output

SELECT
  a.first_name,
  a.last_name,
  count(c.category_id)
FROM actor a
  JOIN film_actor fa
    ON a.actor_id = fa.actor_id
  JOIN film f
    ON fa.film_id = f.film_id
  JOIN film_category fc
    ON f.film_id = fc.film_id
  JOIN category c
    ON fc.category_id = c.category_id
GROUP BY
  a.actor_id,
  a.first_name,
  a.last_name

This also works if you were using Oracle’s arcane outer join syntax using (+) (or SQL Server’s *=, which has been unsupported for a while now). You might have this input:

Input

SELECT 
  a.first_name, 
  a.last_name, 
  count(c.category_id)
FROM 
  actor a, 
  film_actor fa, 
  film f,
  film_category fc, 
  category c
WHERE a.actor_id = fa.actor_id(+)
AND fa.film_id = f.film_id(+)
AND f.film_id = fc.film_id(+)
AND fc.category_id(+) = c.category_id
GROUP BY 
  a.actor_id,
  a.first_name,
  a.last_name

Producing this output

SELECT
  a.first_name,
  a.last_name,
  count(c.category_id)
FROM actor a
  LEFT OUTER JOIN film_actor fa
    ON a.actor_id = fa.actor_id
  LEFT OUTER JOIN film f
    ON fa.film_id = f.film_id
  LEFT OUTER JOIN (
    film_category fc
      RIGHT OUTER JOIN category c
        ON fc.category_id = c.category_id
  )
    ON f.film_id = fc.film_id
GROUP BY
  a.actor_id,
  a.first_name,
  a.last_name

Err, wat. Whoopsies again! One of the (+) symbols was on the wrong side, which is why we got that RIGHT OUTER JOIN. Again, the tool has shown that the old syntax was quite error prone. Let’s fix it.

Fixed input

SELECT 
  a.first_name, 
  a.last_name, 
  count(c.category_id)
FROM 
  actor a, 
  film_actor fa, 
  film f,
  film_category fc, 
  category c
WHERE a.actor_id = fa.actor_id(+)
AND fa.film_id = f.film_id(+)
AND f.film_id = fc.film_id(+)
AND fc.category_id = c.category_id(+)
GROUP BY 
  a.actor_id,
  a.first_name,
  a.last_name

Fixed output

SELECT
  a.first_name,
  a.last_name,
  count(c.category_id)
FROM actor a
  LEFT OUTER JOIN film_actor fa
    ON a.actor_id = fa.actor_id
  LEFT OUTER JOIN film f
    ON fa.film_id = f.film_id
  LEFT OUTER JOIN film_category fc
    ON f.film_id = fc.film_id
  LEFT OUTER JOIN category c
    ON fc.category_id = c.category_id
GROUP BY
  a.actor_id,
  a.first_name,
  a.last_name

Conclusion

Play around with it and tell us what you think! https://www.jooq.org/translate



from Java, SQL and jOOQ. https://ift.tt/35GkQ0B
via IFTTT

No comments:

Post a Comment