About

Thursday, March 11, 2021

Calculating Pagination Metadata Without Extra Roundtrips in SQL

When paginating results in SQL, we use standard SQL OFFSET .. FETCH or a vendor specific version of it, such as LIMIT .. OFFSET. For example:

SELECT first_name, last_name
FROM actor
ORDER BY actor_id
OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY

As always, we’re using the Sakila database for this example.

This is rather straightforward. It will give us page 2 out of N pages, with a page size of 10. But how do we calculate these values? How do we know we’re on page 2? How do we know the number of pages N? Can we calculate this without an extra round-trip e.g. to calculate the total number of actors:

-- Yuck, a second round-trip!
SELECT COUNT(*)
FROM actor

We can do it with a single SQL query and window functions, but before I explain how to do this, please consider reading this article on why OFFSET pagination is a bad thing for your performance

If you’re still convinced OFFSET pagination is what you need, as opposed to keyset pagination, let’s look at how to calculate the above meta data with SQL.

What Metadata Do We Need?

The metadata we typically need to paginate using OFFSET are these:

  • TOTAL_ROWS: The total number of records if we hadn’t paginated
  • CURRENT_PAGE: The current page we’re on
  • MAX_PAGE_SIZE: The maximum page size
  • ACTUAL_PAGE_SIZE: The actual page size (when on the last page)
  • ROW: The actual offsets of the returned rows
  • LAST_PAGE: Whether we are on the last page

The maximum page size is something we set to the query, so it doesn’t have to be calculated. Everything else needs to be calculated. And here’s how to do that in a single query

SELECT 
  t.first_name, 
  t.last_name,

  -- Calculate some meta data
  COUNT(*) OVER () AS actual_page_size,
  MAX(row) OVER () = total_rows AS last_page,

  -- Metadata from the subquery
  total_rows,
  row,
  ((row - 1) / :max_page_size) + 1 AS current_page
FROM (
  SELECT
    u.*,

    -- Calculate some meta data, repeating the ORDER BY from
    -- the original query
    COUNT(*) OVER () AS total_rows,
    ROW_NUMBER () OVER (ORDER BY u.actor_id) AS row

  -- Original query with all the predicates, joins, as a derived table
  FROM (
    SELECT *
    FROM actor
  ) AS u

  -- Ordering and pagination done here, where :offset is
  -- The maximum row value of the previous page + 1
  ORDER BY u.actor_id
  OFFSET :offset ROWS
  FETCH NEXT :max_page_size ROWS ONLY
) AS t
ORDER BY t.actor_id

That’s it. Impressive? Don’t be scared, I’ll walk you through these things step by step. And if you ever get confused by SQL syntax, consider this article explaining the logical order of SQL operations, which is, for our example:

  • FROM (recurse ordering for derived tables)
  • WHERE (which the example omitted)
  • WINDOW calculations
  • SELECT (the projection)
  • ORDER BY
  • OFFSET .. FETCH

Annotating our query, ordering operations logically as 1.1, 1.2, 2.1, 2.2, 2.3, 2.4, 2.5, 3.1, 3.2, 3.3, 3.4:

-- 3.3
SELECT 
  t.first_name, 
  t.last_name,

  -- 3.2
  COUNT(*) OVER () AS actual_page_size,
  MAX(row) OVER () = total_rows AS last_page,

  -- 3.3
  total_rows,
  row,
  ((row - 1) / :max_page_size) + 1 AS current_page
-- 3.1
FROM (
  -- 2.3
  SELECT
    u.*,

    -- 2.2
    COUNT(*) OVER () AS total_rows,
    ROW_NUMBER () OVER (ORDER BY u.actor_id) AS row

  -- 2.1
  FROM (

    -- 1.2
    SELECT *
    -- 1.1
    FROM actor
  ) AS u

  -- 2.4
  ORDER BY u.actor_id

  -- 2.5
  OFFSET :offset ROWS
  FETCH NEXT :max_page_size ROWS ONLY
) AS t

-- 3.4
ORDER BY t.actor_id

Step by step explanation

First off, the original query SELECT * FROM actor is wrapped as a derived table called u. You can do almost anything you want with this original query, applying only a few transformations:

  • 1.1, 1.2, 2.1: You need to project (SELECT clause) the columns that your original query projected, plus the columns that you need for ORDER BY. Because I’m projecting the right things in the outermost query, and because there’s no DISTINCT clause in the original query, I conveniently projected *. Alternatively, I could have projected FIRST_NAME, LAST_NAME (because that is projected in the original query), and ACTOR_ID (because that’s what we ORDER BY).
  • 2.2: On that derived table u, we’re now able to calculate some metadata, including TOTAL_ROWS as COUNT(*) OVER () and ROW as ROW_NUMBER () OVER (ORDER BY t.actor_id). The COUNT(*) OVER () window function has an empty window specification OVER (), meaning it calculates all the rows that result from the FROM, WHERE, GROUP BY, HAVING clauses, i.e. from u in our particular example. Without a second round-trip! The ROW_NUMBER () OVER (ORDER BY u.actor_id) orders all the rows in u by u.actor_id and assigns unique row numbers to them, according to that ordering.
  • 2.3: The window functions are calculated implicitly because they’re located in the projection of this derived table. We’re also again going to conveniently project everything from u.*, because the outer-most query is the one that projects columns explicitly.
  • 2.4: The original ordering has been moved here because there is no guarantee that the ordering would have been maintained if we had ordered the contents of u. But we need the ordering to calculate OFFSET .. FETCH right after
  • 2.5: This is where we paginate. The OFFSET corresponds to the maximum ROW value that we’ve encountered before. We start at 0, and with a page size of 15, we use 15 on the next page. Remember that while indexes are 1 based in SQL, OFFSET is 0 based.
  • 3.1: All of the above is wrapped again in a derived table, in order to make further calculations on it, namely:
  • 3.2: We can again calculate COUNT(*) OVER (), calculating the total number of rows that result from the FROM, WHERE, GROUP BY, HAVING clauses, i.e. from t in our particular example. This time, the number of rows can be no more than MAX_PAGE_SIZE, because that’s what the FETCH (or LIMIT) clause inside of t says. But it can be less, too, so this is what we use to calculate the ACTUAL_PAGE_SIZE. Finally, we compare MAX(row) OVER () = total_rows to see if we’re on the last page, meaning the highest value for row in the current page resulting from t is compared to the total row count. Another way to calculate the LAST_PAGE value would be if ACTUAL_PAGE_SIZE < MAX_PAGE_SIZE, i.e. COUNT(*) OVER () < :MAX_PAGE_SIZE.
  • 3.3: In addition to the usual projection of the original columns FIRST_NAME, LAST_NAME (we’re no longer projecting * now!), we’re doing some final calculations including dividing ROW / TOTAL_ROWS to get the page number. You can calculate more things, such as TOTAL_ROWS / MAX_PAGE_SIZE to get the TOTAL_PAGES value.
  • 3.4: Finally, we have to ORDER BY t.actor_id again, don’t let anyone tell you otherwise. In SQL, if you do not ORDER BY, then the ordering is undefined. Sure, it would be silly for an optimiser to re-order things without any good reason. We’ve already ordered the contents of our subqueries in 2.4, but there is no guarantee, that this ordering is stable. Just add DISTINCT, UNION, or a JOIN resulting in a hash join or some random other operator to your query, and the ordering breaks. So, always ORDER BY if ordering is important to you.

And we’re done!

How to do it in jOOQ?

This is the kind of use-case where jOOQ really really shines, because all of this is about dynamic SQL. Your actual business logic is contained in the deeply nested u table. Everything else is “presentation logic”, which is implemented in SQL for very obvious reasons: To improve performance.

And because you want to implement all of this only once in some library of yours, instead of having to play this game on every query, you make this kind of query dynamic. The utility will look like this:

// Assuming as always the usual static imports, including:
// import static org.jooq.impl.DSL.*;
// import com.generated.code.Tables.*;

static Select<?> paginate(
    DSLContext ctx,
    Select<?> original,
    Field<?>[] sort,
    int limit,
    int offset
) {
    Table<?> u = original.asTable("u");
    Field<Integer> totalRows = count().over().as("total_rows");
    Field<Integer> row = rowNumber().over().orderBy(u.fields(sort))
        .as("row");

    Table<?> t = ctx
        .select(u.asterisk())
        .select(totalRows, row)
        .from(u)
        .orderBy(u.fields(sort))
        .limit(limit)
        .offset(offset)
        .asTable("t");

    Select<?> result = ctx
        .select(t.fields(original.getSelect().toArray(Field[]::new)))
        .select(
            count().over().as("actual_page_size"),
            field(max(t.field(row)).over().eq(t.field(totalRows)))
                .as("last_page"),
            t.field(totalRows),
            t.field(row),
            t.field(row).minus(inline(1)).div(limit).plus(inline(1))
                .as("current_page"))
        .from(t)
        .orderBy(t.fields(sort));

    // System.out.println(result);
    return result;
}

Notice the println for debugging? It will print again something like our original query (but you’ll also see that in your debug log output, by default, with jOOQ):

select
  t.ACTOR_ID,
  t.FIRST_NAME,
  t.LAST_NAME,
  count(*) over () as actual_page_size,
  (max(t.row) over () = t.total_rows) as last_page,
  t.total_rows,
  t.row,
  ((t.row / 15) + 1) as current_page
from (
  select
    u.*,
    count(*) over () as total_rows,
    row_number() over (order by u.ACTOR_ID) as row
  from (
    select
      ACTOR.ACTOR_ID,
      ACTOR.FIRST_NAME,
      ACTOR.LAST_NAME
    from ACTOR
  ) as u
  order by u.ACTOR_ID
  offset 30 rows
  fetch next 15 rows only
) as t
order by t.ACTOR_ID

And here’s how you call the utility:

System.out.println(
    paginate(
        ctx,
        ctx.select(ACTOR.ACTOR_ID, ACTOR.FIRST_NAME, ACTOR.LAST_NAME)
           .from(ACTOR),
        new Field[] { ACTOR.ACTOR_ID },
        15,
        30
    ).fetch()
);

Notice that you can plug in arbitrary SQL fragments into that utility and paginate them. Irrespective of the complexity (including joins, other window functions, grouping, recursion, and what not), jOOQ will have you covered and will now paginate things for you.

The result of the above is:

+--------+----------+---------+----------------+---------+----------+----+------------+
|ACTOR_ID|FIRST_NAME|LAST_NAME|actual_page_size|last_page|total_rows| row|current_page|
+--------+----------+---------+----------------+---------+----------+----+------------+
|      31|SISSY     |SOBIESKI |              15|false    |       200|  31|           3|
|      32|TIM       |HACKMAN  |              15|false    |       200|  32|           3|
|      33|MILLA     |PECK     |              15|false    |       200|  33|           3|
|      34|AUDREY    |OLIVIER  |              15|false    |       200|  34|           3|
|      35|JUDY      |DEAN     |              15|false    |       200|  35|           3|
|      36|BURT      |DUKAKIS  |              15|false    |       200|  36|           3|
|      37|VAL       |BOLGER   |              15|false    |       200|  37|           3|
|      38|TOM       |MCKELLEN |              15|false    |       200|  38|           3|
|      39|GOLDIE    |BRODY    |              15|false    |       200|  39|           3|
|      40|JOHNNY    |CAGE     |              15|false    |       200|  40|           3|
|      41|JODIE     |DEGENERES|              15|false    |       200|  41|           3|
|      42|TOM       |MIRANDA  |              15|false    |       200|  42|           3|
|      43|KIRK      |JOVOVICH |              15|false    |       200|  43|           3|
|      44|NICK      |STALLONE |              15|false    |       200|  44|           3|
|      45|REESE     |KILMER   |              15|false    |       200|  45|           3|
+--------+----------+---------+----------------+---------+----------+----+------------+

Or, on the last page, with offset 195

+--------+----------+---------+----------------+---------+----------+----+------------+
|ACTOR_ID|FIRST_NAME|LAST_NAME|actual_page_size|last_page|total_rows| row|current_page|
+--------+----------+---------+----------------+---------+----------+----+------------+
|     196|BELA      |WALKEN   |               5|true     |       200| 196|          14|
|     197|REESE     |WEST     |               5|true     |       200| 197|          14|
|     198|MARY      |KEITEL   |               5|true     |       200| 198|          14|
|     199|JULIA     |FAWCETT  |               5|true     |       200| 199|          14|
|     200|THORA     |TEMPLE   |               5|true     |       200| 200|          14|
+--------+----------+---------+----------------+---------+----------+----+------------+

Conclusion

jOOQ is all about dynamic SQL. There’s hardly any SQL feature left that jOOQ doesn’t support. This includes window functions, for example, but also making sure that your dynamic SQL works on a large number of SQL dialects, irrespective of the little syntactic details.

You can build your own libraries to construct re-usable SQL elements from other SQL building blocks as this article has shown, to dynamically create single-query OFFSET pagination meta data calculation, without performing additional database round trips.



from Java, SQL and jOOQ. https://ift.tt/30uzu7A
via IFTTT

No comments:

Post a Comment