When paginating results in SQL, we use standard SQL OFFSET .. FETCH
or a vendor specific version of it, such as LIMIT .. OFFSET
. For example:
SELECT first_name, last_name FROM actor ORDER BY actor_id OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY
As always, we’re using the Sakila database for this example.
This is rather straightforward. It will give us page 2 out of N pages, with a page size of 10. But how do we calculate these values? How do we know we’re on page 2? How do we know the number of pages N
? Can we calculate this without an extra round-trip e.g. to calculate the total number of actors:
-- Yuck, a second round-trip! SELECT COUNT(*) FROM actor
We can do it with a single SQL query and window functions, but before I explain how to do this, please consider reading this article on why OFFSET
pagination is a bad thing for your performance
If you’re still convinced OFFSET
pagination is what you need, as opposed to keyset pagination, let’s look at how to calculate the above meta data with SQL.
What Metadata Do We Need?
The metadata we typically need to paginate using OFFSET
are these:
TOTAL_ROWS
: The total number of records if we hadn’t paginatedCURRENT_PAGE
: The current page we’re onMAX_PAGE_SIZE
: The maximum page sizeACTUAL_PAGE_SIZE
: The actual page size (when on the last page)ROW
: The actual offsets of the returned rowsLAST_PAGE
: Whether we are on the last page
The maximum page size is something we set to the query, so it doesn’t have to be calculated. Everything else needs to be calculated. And here’s how to do that in a single query
SELECT t.first_name, t.last_name, -- Calculate some meta data COUNT(*) OVER () AS actual_page_size, MAX(row) OVER () = total_rows AS last_page, -- Metadata from the subquery total_rows, row, ((row - 1) / :max_page_size) + 1 AS current_page FROM ( SELECT u.*, -- Calculate some meta data, repeating the ORDER BY from -- the original query COUNT(*) OVER () AS total_rows, ROW_NUMBER () OVER (ORDER BY u.actor_id) AS row -- Original query with all the predicates, joins, as a derived table FROM ( SELECT * FROM actor ) AS u -- Ordering and pagination done here, where :offset is -- The maximum row value of the previous page + 1 ORDER BY u.actor_id OFFSET :offset ROWS FETCH NEXT :max_page_size ROWS ONLY ) AS t ORDER BY t.actor_id
That’s it. Impressive? Don’t be scared, I’ll walk you through these things step by step. And if you ever get confused by SQL syntax, consider this article explaining the logical order of SQL operations, which is, for our example:
FROM
(recurse ordering for derived tables)WHERE
(which the example omitted)WINDOW
calculationsSELECT
(the projection)ORDER BY
OFFSET .. FETCH
Annotating our query, ordering operations logically as 1.1, 1.2, 2.1, 2.2, 2.3, 2.4, 2.5, 3.1, 3.2, 3.3, 3.4:
-- 3.3 SELECT t.first_name, t.last_name, -- 3.2 COUNT(*) OVER () AS actual_page_size, MAX(row) OVER () = total_rows AS last_page, -- 3.3 total_rows, row, ((row - 1) / :max_page_size) + 1 AS current_page -- 3.1 FROM ( -- 2.3 SELECT u.*, -- 2.2 COUNT(*) OVER () AS total_rows, ROW_NUMBER () OVER (ORDER BY u.actor_id) AS row -- 2.1 FROM ( -- 1.2 SELECT * -- 1.1 FROM actor ) AS u -- 2.4 ORDER BY u.actor_id -- 2.5 OFFSET :offset ROWS FETCH NEXT :max_page_size ROWS ONLY ) AS t -- 3.4 ORDER BY t.actor_id
Step by step explanation
First off, the original query SELECT * FROM actor
is wrapped as a derived table called u
. You can do almost anything you want with this original query, applying only a few transformations:
- 1.1, 1.2, 2.1: You need to project (
SELECT
clause) the columns that your original query projected, plus the columns that you need forORDER BY
. Because I’m projecting the right things in the outermost query, and because there’s noDISTINCT
clause in the original query, I conveniently projected*
. Alternatively, I could have projectedFIRST_NAME
,LAST_NAME
(because that is projected in the original query), andACTOR_ID
(because that’s what weORDER BY
). - 2.2: On that derived table
u
, we’re now able to calculate some metadata, includingTOTAL_ROWS
asCOUNT(*) OVER ()
andROW
asROW_NUMBER () OVER (ORDER BY t.actor_id)
. TheCOUNT(*) OVER ()
window function has an empty window specificationOVER ()
, meaning it calculates all the rows that result from theFROM
,WHERE
,GROUP BY
,HAVING
clauses, i.e. fromu
in our particular example. Without a second round-trip! TheROW_NUMBER () OVER (ORDER BY u.actor_id)
orders all the rows inu
byu.actor_id
and assigns unique row numbers to them, according to that ordering. - 2.3: The window functions are calculated implicitly because they’re located in the projection of this derived table. We’re also again going to conveniently project everything from
u.*
, because the outer-most query is the one that projects columns explicitly. - 2.4: The original ordering has been moved here because there is no guarantee that the ordering would have been maintained if we had ordered the contents of
u
. But we need the ordering to calculateOFFSET .. FETCH
right after - 2.5: This is where we paginate. The
OFFSET
corresponds to the maximumROW
value that we’ve encountered before. We start at0
, and with a page size of15
, we use15
on the next page. Remember that while indexes are1
based in SQL,OFFSET
is0
based. - 3.1: All of the above is wrapped again in a derived table, in order to make further calculations on it, namely:
- 3.2: We can again calculate
COUNT(*) OVER ()
, calculating the total number of rows that result from theFROM
,WHERE
,GROUP BY
,HAVING
clauses, i.e. fromt
in our particular example. This time, the number of rows can be no more thanMAX_PAGE_SIZE
, because that’s what theFETCH
(orLIMIT
) clause inside oft
says. But it can be less, too, so this is what we use to calculate theACTUAL_PAGE_SIZE
. Finally, we compareMAX(row) OVER () = total_rows
to see if we’re on the last page, meaning the highest value forrow
in the current page resulting fromt
is compared to the total row count. Another way to calculate theLAST_PAGE
value would be ifACTUAL_PAGE_SIZE < MAX_PAGE_SIZE
, i.e.COUNT(*) OVER () < :MAX_PAGE_SIZE
. - 3.3: In addition to the usual projection of the original columns
FIRST_NAME
,LAST_NAME
(we’re no longer projecting*
now!), we’re doing some final calculations including dividingROW / TOTAL_ROWS
to get the page number. You can calculate more things, such asTOTAL_ROWS / MAX_PAGE_SIZE
to get theTOTAL_PAGES
value. - 3.4: Finally, we have to
ORDER BY t.actor_id
again, don’t let anyone tell you otherwise. In SQL, if you do notORDER BY
, then the ordering is undefined. Sure, it would be silly for an optimiser to re-order things without any good reason. We’ve already ordered the contents of our subqueries in 2.4, but there is no guarantee, that this ordering is stable. Just addDISTINCT
,UNION
, or aJOIN
resulting in a hash join or some random other operator to your query, and the ordering breaks. So, alwaysORDER BY
if ordering is important to you.
And we’re done!
How to do it in jOOQ?
This is the kind of use-case where jOOQ really really shines, because all of this is about dynamic SQL. Your actual business logic is contained in the deeply nested u
table. Everything else is “presentation logic”, which is implemented in SQL for very obvious reasons: To improve performance.
And because you want to implement all of this only once in some library of yours, instead of having to play this game on every query, you make this kind of query dynamic. The utility will look like this:
// Assuming as always the usual static imports, including: // import static org.jooq.impl.DSL.*; // import com.generated.code.Tables.*; static Select<?> paginate( DSLContext ctx, Select<?> original, Field<?>[] sort, int limit, int offset ) { Table<?> u = original.asTable("u"); Field<Integer> totalRows = count().over().as("total_rows"); Field<Integer> row = rowNumber().over().orderBy(u.fields(sort)) .as("row"); Table<?> t = ctx .select(u.asterisk()) .select(totalRows, row) .from(u) .orderBy(u.fields(sort)) .limit(limit) .offset(offset) .asTable("t"); Select<?> result = ctx .select(t.fields(original.getSelect().toArray(Field[]::new))) .select( count().over().as("actual_page_size"), field(max(t.field(row)).over().eq(t.field(totalRows))) .as("last_page"), t.field(totalRows), t.field(row), t.field(row).minus(inline(1)).div(limit).plus(inline(1)) .as("current_page")) .from(t) .orderBy(t.fields(sort)); // System.out.println(result); return result; }
Notice the println for debugging? It will print again something like our original query (but you’ll also see that in your debug log output, by default, with jOOQ):
select t.ACTOR_ID, t.FIRST_NAME, t.LAST_NAME, count(*) over () as actual_page_size, (max(t.row) over () = t.total_rows) as last_page, t.total_rows, t.row, ((t.row / 15) + 1) as current_page from ( select u.*, count(*) over () as total_rows, row_number() over (order by u.ACTOR_ID) as row from ( select ACTOR.ACTOR_ID, ACTOR.FIRST_NAME, ACTOR.LAST_NAME from ACTOR ) as u order by u.ACTOR_ID offset 30 rows fetch next 15 rows only ) as t order by t.ACTOR_ID
And here’s how you call the utility:
System.out.println( paginate( ctx, ctx.select(ACTOR.ACTOR_ID, ACTOR.FIRST_NAME, ACTOR.LAST_NAME) .from(ACTOR), new Field[] { ACTOR.ACTOR_ID }, 15, 30 ).fetch() );
Notice that you can plug in arbitrary SQL fragments into that utility and paginate them. Irrespective of the complexity (including joins, other window functions, grouping, recursion, and what not), jOOQ will have you covered and will now paginate things for you.
The result of the above is:
+--------+----------+---------+----------------+---------+----------+----+------------+ |ACTOR_ID|FIRST_NAME|LAST_NAME|actual_page_size|last_page|total_rows| row|current_page| +--------+----------+---------+----------------+---------+----------+----+------------+ | 31|SISSY |SOBIESKI | 15|false | 200| 31| 3| | 32|TIM |HACKMAN | 15|false | 200| 32| 3| | 33|MILLA |PECK | 15|false | 200| 33| 3| | 34|AUDREY |OLIVIER | 15|false | 200| 34| 3| | 35|JUDY |DEAN | 15|false | 200| 35| 3| | 36|BURT |DUKAKIS | 15|false | 200| 36| 3| | 37|VAL |BOLGER | 15|false | 200| 37| 3| | 38|TOM |MCKELLEN | 15|false | 200| 38| 3| | 39|GOLDIE |BRODY | 15|false | 200| 39| 3| | 40|JOHNNY |CAGE | 15|false | 200| 40| 3| | 41|JODIE |DEGENERES| 15|false | 200| 41| 3| | 42|TOM |MIRANDA | 15|false | 200| 42| 3| | 43|KIRK |JOVOVICH | 15|false | 200| 43| 3| | 44|NICK |STALLONE | 15|false | 200| 44| 3| | 45|REESE |KILMER | 15|false | 200| 45| 3| +--------+----------+---------+----------------+---------+----------+----+------------+
Or, on the last page, with offset 195
+--------+----------+---------+----------------+---------+----------+----+------------+ |ACTOR_ID|FIRST_NAME|LAST_NAME|actual_page_size|last_page|total_rows| row|current_page| +--------+----------+---------+----------------+---------+----------+----+------------+ | 196|BELA |WALKEN | 5|true | 200| 196| 14| | 197|REESE |WEST | 5|true | 200| 197| 14| | 198|MARY |KEITEL | 5|true | 200| 198| 14| | 199|JULIA |FAWCETT | 5|true | 200| 199| 14| | 200|THORA |TEMPLE | 5|true | 200| 200| 14| +--------+----------+---------+----------------+---------+----------+----+------------+
Conclusion
jOOQ is all about dynamic SQL. There’s hardly any SQL feature left that jOOQ doesn’t support. This includes window functions, for example, but also making sure that your dynamic SQL works on a large number of SQL dialects, irrespective of the little syntactic details.
You can build your own libraries to construct re-usable SQL elements from other SQL building blocks as this article has shown, to dynamically create single-query OFFSET
pagination meta data calculation, without performing additional database round trips.
from Java, SQL and jOOQ. https://ift.tt/30uzu7A
via IFTTT
No comments:
Post a Comment