About

Wednesday, February 2, 2022

Traversing jOOQ Expression Trees with the new Traverser API

Starting from jOOQ 3.16, we’re investing a lot into opening up our internal query object model (QOM) as a public API. This is mainly useful for people who use jOOQ’s parser and wish to access the parsed expression tree, or to transform SQL, e.g. to implement row level security in jOOQ.

But occasionally, even with ordinary jOOQ dynamic SQL usage, it can be useful to access the expression tree.

Please note that as of jOOQ 3.16, all of this new API is experimental and thus subject to incompatible changes in the future. Use it at your own risk.

The Query Object Model (QOM) API

The first improvement is to provide an API to the query object model itself. A new type called org.jooq.impl.QOM contains all of this new public API, while the implementations are still the same old ones, in the org.jooq.impl package, but with package-private visibility.

When you create a SUBSTRING() function call expression, you’re getting a Field<String> expression that implements QOM.Substring. On that type, you can call various accessor methods always starting with a "$" sign to access the function arguments:

// Create an expression using the DSL API:
Field<String> field = substring(BOOK.TITLE, 2, 4);
// Access the expression's internals using the model API
if (field instanceof QOM.Substring substring) {
    Field<String> string = substring.$string();
    Field<? extends Number> startingPosition = 
        substring.$startingPosition();
    Field<? extends Number> length = substring.$length();
}

Some things that may be subject to change:

1. It’s not clear yet if the DSL method substring() returns the QOM type Substring, or the DSL type Field. There are pros and cons to both, though there’s a slight preference for keeping the QOM type out of sight for DSL users.

2. The "$" prefix is used to clearly distinguish between the DSL API (no prefix) and the QOM API ("$" prefix) as the type hierarchy is now shared between the two APIs, and it should be clear for users whether they’re constructing jOOQ objects for usage in the DSL, or whether they’re manipulating objects of the expression tree.

For each accessor, there’s also a “mutator”, a method that produces a new QOM type containing the mutated value. All of the QOM type is immutable, so the original Substring instance isn’t affected by a modification like this:

Substring substring1 = (Substring) substring(BOOK.TITLE, 2, 4);
Substring substring2 = substring1
    .$startingPosition(val(3))
    .$length(val(5));

assertEquals(substring2, substring(BOOK.TITLE, 3, 5));

All of the above API, the accessors, and the mutators will be available to all jOOQ editions, including the jOOQ Open Source Edition.

Expression tree traversal

The real fun starts when you want to traverse the expression tree, e.g. to look for the presence of objects, to collect objects, etc. For this, we have introduced the new Traverser API in the commercial jOOQ distributions.

A Traverser works quite similarly as a JDK Collector, which traverses a Stream and collects elements into some data structure. But the Traverser operates on a tree, and thus has a few additional features:

  • It can receive events before and after visiting a tree element (and their subtrees!)
  • It can decide for each tree element, whether the traversal should recurse into the subtree. This is very useful, for example, if you don’t care about traversing subqueries of any kind.
  • It can decide whether to abort traversal early, e.g. when the first object has been found. I’m not aware of JDK Collector offering such short circuiting methods, even though I think that would be useful, there, too. (It’s possible with a Spliterator, but that’s much more cumbersome)
  • It’s not parallel capable. Parallelism is already an optional feature with streams, but with trees, we haven’t found the benefit of supporting that yet, keeping traversals much simpler.

A simple traversal example would be to count all the QueryPart objects in an expression, like this:

// Contains 7 query parts
System.out.println(
    T_BOOK.ID.eq(1).or(T_BOOK.ID.eq(2))
        .$traverse(() -> 0, (c, p) -> c + 1)
);

The simple convenience method provides an auxiliary data structure (here an int), and a function that accumulates every query part into that data structure. The result is the data structure (the int) itself.

Why does it count 7? Because it traverses the following tree:

1: T_BOOK.ID.eq(1).or(T_BOOK.ID.eq(2))
2: T_BOOK.ID.eq(1)
3: T_BOOK.ID
4: 1
5: T_BOOK.ID.eq(2)
6: T_BOOK.ID
7: 2

Or visually:

OR
├── EQ
│   ├── T_BOOK.ID
│   └── 1
└── EQ
    ├── T_BOOK.ID
    └── 2

If you wanted to simply collect each individual QueryPart, just do it like this:

System.out.println(
    T_BOOK.ID.eq(1).or(T_BOOK.ID.eq(2))
        .$traverse(
            () -> new ArrayList<QueryPart>(),
            (list, p) -> {
                list.add(p);
                return list;
            }
        )
);

The output of this is (not native formatting):

[
  ("PUBLIC"."T_BOOK"."ID" = 1 or "PUBLIC"."T_BOOK"."ID" = 2),
  "PUBLIC"."T_BOOK"."ID" = 1,
  "PUBLIC"."T_BOOK"."ID",
  1,
  "PUBLIC"."T_BOOK"."ID" = 2,
  "PUBLIC"."T_BOOK"."ID",
  2
]

This example shows that the tree is traversed in a depth-first manner.

But you don’t have to write such simple Traversers yourself. Any JDK Collector can be used as a Traverser, so the above two examples can be rewritten like this:

System.out.println(
    T_BOOK.ID.eq(1).or(T_BOOK.ID.eq(2))
        .$traverse(Traversers.collecting(Collectors.counting()))
);

System.out.println(
    T_BOOK.ID.eq(1).or(T_BOOK.ID.eq(2))
        .$traverse(Traversers.collecting(Collectors.toList()))
);

Want to collect all the involved tables of a query? No problem!

System.out.println(
    T_BOOK.ID.eq(1).and(T_AUTHOR.ID.eq(3))
        .$traverse(Traversers.collecting(
            Collectors.mapping(
                p -> p instanceof TableField<?, ?> tf 
                    ? tf.getTable() 
                    : p,
                Collectors.filtering(
                    p -> p instanceof Table,
                    Collectors.toSet()
                )
            )
        ))
);

This can be read as:

  • Map all TableField references to their Table containers
  • Filter out all Table references
  • Collect them to a distinct Set of tables.

Producing:

["PUBLIC"."T_BOOK", "PUBLIC"."T_AUTHOR"]

Expression tree transformations

What if you want to replace one expression by another? There are various use-cases, which we’ll eventually support out of the box in the commercial jOOQ editions, but you can also roll your own extensions using this API.

A very simple example of such a transformation would remove redundant boolean negation:

// Contains redundant operators
Condition c = not(not(BOOK.ID.eq(1)));
System.out.println(c.$replace(q ->
    q instanceof Not n1 && n1.$arg1() instanceof Not n2
        ? n2.$arg1()
        : q
));

Despite having explicitly written not(not(x)), the output is just x, or specifically:

"BOOK"."ID" = 1

Actual example use-cases for such transformations include:

Optimisations and replacements of common patterns

There are a few reasons to normalise and improve common patterns of SQL strings:

Starting from jOOQ 3.17, we’ll offer a lot of these transformations out of the box. You can turn them on for different reasons:

  • To generally optimise your SQL output
  • To detect problems in your queries, both implemented via jOOQ API, or when intercepting them via the parser – the rule of thumb being that if this pattern recognition feature finds something to transform, then your own SQL query should be improved. A linter, so to speak.

Out of the box feature starting from jOOQ 3.17: https://github.com/jOOQ/jOOQ/issues/7284

Row level security or shared schema multi tenancy

You can already today implement client-side row level security using jOOQ’s VisitListener SPI, a predecessor to these SQL transformation features that are based on the new query object model. But with the new replacement API, it will be much simpler both for users, as well as for us to support an out of the box row level security feature. In short, imagine that every time you query a restricted table, such as ACCOUNT:

SELECT * FROM account

What you want is to ensure users can only access their own accounts, i.e. this should be patched into the query, transparently for the developer:

SELECT * FROM account WHERE account_id IN (:userAccountList)

A simple algorithm would be to write:

QueryPart q = select(ACCOUNT.ID).from(ACCOUNT);
System.out.println(
    q.$replace(p -> {
        if (p instanceof Select<?> s) {

            // Check if the query contains the relevant table(s) in
            // the FROM clause
            if (s.$from().$traverse(Traversers.containing(ACCOUNT)) && (

                // In the absence of a WHERE clause
                s.$where() == null ||

                // Or, if we haven't already added our IN list
                !s.$where().$traverse(Traversers.containing(
                    x -> x instanceof InList<?> i 
                        && ACCOUNT.ID.equals(i.$arg1())
                ))
            )) {

                // Append a predicate to the query
                // Imagine this reading some context info
                return s.$where(DSL.and(s.$where(), 
                    ACCOUNT.ID.in(1, 2, 3)));
            }
        }
        return p;
    })
);

The result of the above will be:

select "PUBLIC"."ACCOUNT"."ID"
from "PUBLIC"."ACCOUNT"
where "PUBLIC"."ACCOUNT"."ID" in (
  1, 2, 3
)

Notice how the input SQL query doesn’t contain any such predicate. Obviously, this is far from complete. It doesn’t handle outer joins correctly (where the predicate might have to go into the ON clause), and other caveats. Stay tuned for more, in this area!

Out of the box feature without a release target yet:
https://github.com/jOOQ/jOOQ/issues/2682

More use-cases

There are many more use-cases, which we’re planning on supporting out of the box, based on the above feature set. Those include:

  • Soft deletion, transforming DELETE statements into “equivalent” UPDATE .. SET deleted = true statements, as well as SELECT statements into “equivalent” SELECT .. WHERE NOT deleted, see https://github.com/jOOQ/jOOQ/issues/2683
  • Audit column support where we update “audit” fields such as CREATED_AT, CREATED_BY, MODIFIED_AT, MODIFIED_BY whenever they’re touched by any DML query, see https://github.com/jOOQ/jOOQ/issues/1592

Use-case agnosticity

Remember that like most other jOOQ features, this one, too, is completely use-case agnostic. It does not matter if you’re using jOOQ:

  • As an internal DSL to create dynamic (or “static”) SQL queries
  • As a parser to translate between SQL dialects
  • As a parser to enrich your application that is based on a legacy ORM
  • As a diagnostics utility to run checks on your legacy ORM based application

Irrespective of the use-case, you can use this API to analyse and transform your SQL queries.

Limitations (as of jOOQ 3.16)

As mentioned before, thus far, this is an experimental feature, not truly production ready yet. There are quite a few known limitations of the current design and implementation. Please consider this issue for open issues:

https://github.com/jOOQ/jOOQ/issues/12772

The most important limitations so far include:

  • Support only for SELECT, no other statements
  • Traversal does not yet go into JOIN trees or UNION / INTERSECT / EXCEPT subqueries

There are more limitations, but these ones are the most important ones. So, stay tuned for more exciting developments in this area coming soon in the next jOOQ releases.



from Java, SQL and jOOQ. https://ift.tt/36PEYTNXx
via IFTTT

No comments:

Post a Comment