Over the past 13 years, jOOQ has accrued quite some internal features, which you, the user, are not exposed to. One very interesting feature is the capability for any arbitrary jOOQ expression tree element to push a SQL fragment up to a higher level.
How does it work?
The jOOQ expression tree model
When you write a jOOQ query, you’re really creating an expression tree of your SQL statement that just happens to look like actual SQL syntax. For example:
ctx.select(BOOK.ID, BOOK.TITLE) .from(BOOK) .where(BOOK.AUTHOR_ID.eq(1)) .fetch();
In jOOQ, this is nothing else than an expression tree of this form
Select (Statement)
|
+------------------+-----------------+
| | |
Select From Where
| | |
+-----+------+ | |
| | | |
TableField TableField Table CompareCondition
| | | | | | | |
Table Field Table Field BOOK TableField EQ Val
| | | | | | |
BOOK ID BOOK TITLE Table Field 1
| |
BOOK AUTHOR_ID
When generating the SQL, jOOQ mostly just traverses the tree brepth first (just kidding. It’s mostly depth first, but some child elements on the same level are often considered, prior to going down one level), collecting each element into a StringBuilder
to the expected form:
SELECT book.id, book.title FROM book WHERE book.author_id = 1
Each of these so-called QueryPart expression tree elements can decide for itself how it’s going to render its SQL. For example, the CompareCondition
will roughly generate this sequence:
<lhs> <operator> <rhs>
… further delegating SQL generation to its children, whatever they may be. A TableField
will decide whether to fully / partially / or not qualify its Field
reference at all, etc., e.g. based on the schema mapping (multi tenancy) feature.
If you’re using functions, such as Substring
, that function can decide on its own how it should generate its SQL. As can be seen in the manual, these are all the same:
-- ACCESS mid('hello world', 7) -- ASE, SQLDATAWAREHOUSE, SQLSERVER substring('hello world', 7, 2147483647) -- AURORA_MYSQL, AURORA_POSTGRES, COCKROACHDB, CUBRID, H2, -- HANA, HSQLDB, IGNITE, MARIADB, MEMSQL, MYSQL, POSTGRES, REDSHIFT, -- SNOWFLAKE, SYBASE, VERTICA substring('hello world', 7) -- DB2, DERBY, INFORMIX, ORACLE, SQLITE substr('hello world', 7) -- FIREBIRD, TERADATA substring('hello world' FROM 7)
This way, a jOOQ expression tree can emulate any syntax in any dialect.
But what if the emulation is not local?
Non local emulations
Sometimes, a QueryPart needs to assume the presence of a non-local syntactic element to work. A recent case was https://github.com/jOOQ/jOOQ/issues/11366.
When writing this procedural logic in jOOQ:
for_(i).in(1, 10).loop( insertInto(t).columns(a).values(i) )
Of course, you wouldn’t do this. You’d write a bulk insert statement instead and solve this with SQL only! But you have your reasons, right?
Then, the indexed FOR loop may have to be emulated in some dialects, using an equivalent WHILE statement. So, instead of this straight forward procedural SQL generation, which we might get in Oracle for example:
FOR i IN 1 .. 10 LOOP INSERT INTO t (a) VALUES (i); END LOOP;
… we generate this in MySQL, more or less:
DECLARE i INT DEFAULT 1; WHILE i <= 10 DO INSERT INTO t (a) VALUES (i); SET i = i + 1; END WHILE;
This can still be done completely locally, as shown before. There’s a FOR
expression tree element that can locally generate the DECLARE
and WHILE
query parts, instead. But what if local variables aren’t possible? What if there’s no block scope, as in Firebird?
In Firebird, all your local variables must be declared in a top level declaration section. If we run the above in an anonymous block, the correct generated procedural SQL would be this:
EXECUTE BLOCK AS DECLARE i INT; BEGIN :i = 1; -- The loop variable must still be initialised locally WHILE (:i <= 10) DO BEGIN INSERT INTO t (a) VALUES (i); :i = :i + 1; END END
This is still the case when we further nest the loop in procedural control flow elements, such as:
for_(i).in(1, 10).loop( for_(j).in(1, 10).loop( insertInto(t).columns(a, b).values(i, j) ) )
Of course, you still wouldn’t do this. You’d write a bulk insert statement from a cartesian product instead and solve this with SQL only! But alas, let’s keep the example simple
We now have nested variable declarations, which still work well in MySQL
DECLARE i INT DEFAULT 1; WHILE i <= 10 DO BEGIN DECLARE j INT DEFAULT 1; WHILE j <= 10 DO INSERT INTO t (a, b) VALUES (i, j); SET j = j + 1; END WHILE; END SET i = i + 1; END WHILE;
But in Firebird, the declarations will both have to be pushed up to the top:
EXECUTE BLOCK AS DECLARE i INT; DECLARE j INT; BEGIN :i = 1; -- The loop variable must still be initialised locally WHILE (:i <= 10) DO BEGIN :j = 1; -- The loop variable must still be initialised locally WHILE (:j <= 10) DO BEGIN INSERT INTO t (a, b) VALUES (i, j); :j = :j + 1; END :i = :i + 1; END END
This doesn’t handle all the edge cases yet (e.g. some dialects allow for “hiding” local variables, such as PL/SQL) but it goes a long way already for simple procedures, functions, and triggers, all of which will be supported starting in jOOQ 3.15.
How does it work?
Alternative 1: Expression tree transformation
There are numerous ways to make such an expression tree transformation work. In the long run, we will re-design our internal expression tree model, and make it available also as a public API to those who wish to use the jOOQ parser and expression tree transformation as a standalone product. To some extent, this is already possible using the VisitListener
SPI as shown in this post about row level security, but the current implementation is complex.
Also, it is relatively rare (so far) for an expression tree to require non-local transformation. This means that eagerly trying to look for possible candidates every time is probably overkill.
Alternative 2: Lazy expression tree transformation
We could transform the expression tree “lazily”, i.e. still assuming it is unnecessary, and when it is, throw an exception and start again. We actually do this, the “pattern” in jOOQ is called a ControlFlowSignal
, and it is mainly used to work around the maximum number of bind parameters per statement limit of different dialects. I.e. we just count bind values, and if there are more than 2000 in SQL Server (2100 are supported by SQL Server, but only 2000 by jtds), then we just re-generate the SQL from scratch using inline values in a static statement.
As always with jOOQ, you can re-configure these limits to your own value.
Another case is when you forgot to turn on the ROWNUM
to LIMIT
transformation when migrating off Oracle’s old ROWNUM
filtering. It would be silly to eagerly search for ROWNUM
instances every time. Instead, we just re-generate the entire SQL query when we encounter one, and when you’re not using Oracle.
The assumption here is that these things happens very rarely, and if they do, you haven’t thought of it, and you don’t want the query to fail in production. The fact that a probably already slow query takes just a bit more time for jOOQ to generate is a price worth paying for the query to still Just Work
Alternative 3: Patching generated SQL strings
Now, this is what we are actually doing.
Better assume almost all SQL transformations are local (as in the Substring
example), and patch the SQL in case they’re not. In the end, we’re just generating SQL strings! And as such, why not introduce an infrastructure where we can put special markers to special text areas, and then replace that area with alternative SQL content.
Without the fix #11366, the generated code may have looked like this:
EXECUTE BLOCK AS -- Special marker here BEGIN -- Entering scope DECLARE i INT DEFAULT 1; WHILE (:i <= 10) DO BEGIN DECLARE j INT DEFAULT 1; WHILE (:j <= 10) DO BEGIN INSERT INTO t (a, b) VALUES (i, j); :j = :j + 1; END :i = :i + 1; END -- Exiting scope END
This doesn’t work in Firebird, so we apply the fix. Note there’s a cheap, special marker that is placed by the SQL generation of anonymous blocks, but also for procedures, functions, and triggers, e.g.
CREATE FUNCTION func1() RETURNS INTEGER AS -- Special marker here BEGIN -- Entering scope RETURN 1; -- Exiting scope END
Now, whenever the org.jooq.impl.DeclarationImpl
query part generates its SQL locally, then, instead of generating something like:
DECLARE i INT DEFAULT 1; DECLARE j INT;
We generate (locally)
:i = 1; -- j isn't initialised here, so nothing needs to be done locally
At the same time, we push the org.jooq.impl.DeclarationImpl
into some context variable that is visible to the entire scope (see “entering scope” and “exiting scope” comments).
As soon as we exit the scope, we must render all collected declarations, this time without the defaults, e.g.
DECLARE i INT; DECLARE j INT;
… and then insert that rendered SQL right where the marker was located. All subsequent markers, if any, are then shifted by the difference of text lengths, of course.
Applications in jOOQ
This is currently used a few times within jOOQ:
- To emulate calling Oracle PL/SQL functions with
BOOLEAN
parameters / return values. We patch the generated SQL producing a syntheticWITH
clause with someBOOLEAN
toNUMBER
translation logic. Since Oracle 12c, Oracle supports PL/SQL embedded in WITH, which is quite a nifty feature! - The entire implicit JOIN feature is implemented this way! Markers delimit each table in the
FROM
clause (e.g.FROM BOOK
), and if any path originating from any such marked table is encountered in the query (e.g.SELECT BOOK.AUTHOR.FIRST_NAME
), then 1) instead of generating the path, a synthetic alias for the path is used to qualify the column, 2) instead of generating theFROM BOOK
table, a syntheticLEFT JOIN
orINNER JOIN
is generated, joining all the necessary to-one relationships. An example will be shown below. - The above Firebird (and possibly also T-SQL, let’s see) procedural local variable scoping fixes are implemented this way.
- A few emulations that require prepending SQL to the complete statement, such as
CREATE OR REPLACE PROCEDURE x
emulations that prependDROP PROCEDURE x
toCREATE PROCEDURE x
work in a similar fashion. These types of emissions are “special” in that they add another statement to the statement batch. This means we’ll also have to be careful to skip any possible result sets or update counts this produces when calling the batch from JDBC.
Future applications may include:
- More top level CTE, which are quite handy for a variety of emulations
An example for implicit joins:
SELECT book.author.first_name, book.author.last_name FROM book -- Special marker around book
The above SQL doesn’t work in any dialect, it’s just jOOQ specific. We generate an alias for each unique path, based on the path’s hash code, so the query might look like this, instead:
SELECT -- The path isn't generated, but an alias for it is alias_xyz.first_name, alias_xyz.last_name FROM ( -- The marked "book" table is replaced by a join tree book LEFT JOIN author AS alias_xyz ON book.author_id = author.id )
We simply replace book
by (book LEFT JOIN ...)
in the resulting SQL string. Thanks to our infrastructure of being able to define scopes and register context and variables for each scope, this works for arbitrary levels of nesting. We can always identify the appropriate book
identifier for each path expression, even for things like this:
SELECT book.author.first_name, book.author.last_name, -- Different book tables here, because the nested scope -- hides the "book" identifier from the outer scope (SELECT count(*) FROM book), (SELECT count(DISTINCT book.author.first_name) FROM book), -- Outer book, again (SELECT book.author.first_name) FROM book
The above being emulated like this, by patching two marked occurrences of book
by the same join graph:
SELECT book_xyz.first_name, book_xyz.last_name, -- No patching done to this book (SELECT count(*) FROM book), -- The alias_xyz alias is used again, the path is the same (SELECT count(DISTINCT book_xyz.first_name) -- And the book table is patched again with the same left join FROM ( book LEFT JOIN author AS alias_xyz ON book.author_id = author.id )), -- Outer book, again (SELECT book_xyz.first_name) FROM ( book LEFT JOIN author AS alias_xyz ON book.author_id = author.id )
It sounds as sophisticated as it sounds complex, but it works really really well.
Perhaps, in the future, expression tree transformation will be preferred over patching the result string. So far, with the current applications, this was the path of least resistance, and highest performance.
from Java, SQL and jOOQ. https://ift.tt/2O5Rhz7
via IFTTT
No comments:
Post a Comment