About

Friday, June 4, 2021

How to Prevent Execution Plan Troubles when Querying Skewed Data, with jOOQ

One of the biggest advantages of using jOOQ is that you can change all of your complex application’s generated SQL with just a few lines of code. In this article, we’ll look into how to solve some common bind peeking issues just like that, without touching your application code, without the need to explain this advanced SQL performance issue to every team member, and best of all: for good.

What are Bind Values Good For?

First of all, bind variables are a good thing. They:

  • Prevent SQL injection
  • Ensure syntactic correctness
  • Improve performance in DBMS with execution plan caches

The latter bullet doesn’t affect all dialects, just like this article doesn’t. Commercial DBMS like Oracle, SQL Server, etc. ship with a powerful execution plan cache. They are designed to run thousands of distinct queries with very complex plans. Planning these queries takes time (I’ve seen Oracle SQL being planned for 5 seconds!) and you don’t want the DBMS to re-execute this planning work every time the query is run, which may be thousands of times per second.

Usually, this execution plan cache takes the SQL string (or a hash of it), and associates meta data like alternative possible execution plans to it. When the SQL string changes, the cache lookup fails and the “new” query has to be planned again.

I say “new”, because it might be the “same” query to the user, even if the string is different. For example:

SELECT * FROM book WHERE id = 1;
SELECT * FROM book WHERE id = 2;

Now we have two times the “same” query, but each of them are “new”. Oracle will plan them both afresh. So, we use bind variables, instead:

SELECT * FROM book WHERE id = ?;
SELECT * FROM book WHERE id = ?;

What is the Bind Peeking Issue?

In some cases, the cached plan is not optimal. This is the case when the actual bind value is significant to the planning, e.g. the value 1 would produce a vastly different plan than the value 2, or much more likely the value DELETED produces a different plan from PROCESSED or NEW.

This problem has been discussed in our previous blog post Why You Should Design Your Database to Optimise for Statistics.

“Bind Peeking” is a technique by the Oracle database (it is done by others, too, but maybe not called “Bind Peeking”) to have a “peek” at the bind variable to get a more accurate plan than the average plan if we don’t know the bind value. This can go both ways, good or bad, so there were numerous fixes / patches / workarounds in previous Oracle versions. Some interesting articles on the topic:

Databases slowly get to truly adaptive query execution models where execution plans can be fixed in-flight when the estimates were clearly wrong. Db2 is quite strong at this, and Oracle is getting better.

But even then, some times planners get it wrong, simply because they can’t reasonably estimate the cardinalities produced by a simple predicate like

WHERE x = ?

… just because the overall query is very complex and some SQL transformation did not apply.

Preventing the Problem by Avoiding Bind Values

Again. Please use bind values by default. They’re a good thing by default. Not all data is as skewed as the one I presented in my other blog post. But some data is almost always skewed: Enumeration types.

When you have an enum like:

enum ProcessingState {
  NEW,
  PROCESSING,
  EXECUTED,
  DELETED
}

Or in PostgreSQL:

CREATE TYPE processing_state AS ENUM (
  'new',
  'processing',
  'executed',
  'deleted'
);

Or even just encoded as a CHECK constraint:

CREATE TABLE transaction (
  -- ...
  processing_state VARCHAR(10) CHECK (processing_state IN (
    'new',
    'processing',
    'executed',
    'deleted'
  ))
  -- ...
);

In that case, you will very likely have highly skewed data. For example, a quick query might yield:

SELECT processing_state, count(*)
FROM transaction
GROUP BY processing_state

Resulting in:

+------------------+----------+
| processing_state |    count |
+------------------+----------+
| new              |    10234 |
| processing       |       15 |
| executed         | 17581684 |
| deleted          |    83193 |
+------------------+----------+

Now, do you think you’ll profit from indexing PROCESSING_STATE equally, when looking for NEW or PROCESSING versus when looking for EXECUTED values? Do you want the same plans? Do you want an average plan, which might not use the index, when in fact you should use it (looking for PROCESSING)?

Not only that, but it’s also quite unlikely that your queries are so generic as that the individual PROCESSING_STATE values can be used interchangeably. For example, a query looking for DELETED states might be run by a housekeeping batch job that wants to remove the logically deleted transactions for good. It will never query anything other than DELETED states. So, might as well inline, right?

Now, if you write a query like this:

SELECT *
FROM transaction
WHERE processing_state = 'processing';

With jOOQ, you can create an “inline” on a per-query basis using DSL.inline("processing") (as opposed to DSL.val("processing"), which is used by default, implicitly:

// These generate a ? bind value
ctx.selectFrom(TRANSACTION)
   .where(TRANSACTION.PROCESSING_STATE.eq("processing"))
   .fetch();

ctx.selectFrom(TRANSACTION)
   .where(TRANSACTION.PROCESSING_STATE.eq(val("processing")))
   .fetch();

// This creates an inline 'processing' literal
ctx.selectFrom(TRANSACTION)
   .where(TRANSACTION.PROCESSING_STATE.eq(inline("processing")))
   .fetch();

As always, assuming this static import:

import static org.jooq.impl.DSL.*;

But now, you have to think of doing that every single time you query PROCESSING_STATE, and all the other similar columns.

Prevent it for Good

Much better, prevent it for good. You can create a very simple Binding like this:

class AlwaysInlineStringBinding implements Binding<String, String> {

    final Binding<?, String> delegate = VARCHAR.getBinding();

    @Override
    public Converter<String, String> converter() {
        return Converters.identity(String.class);
    }

    @Override
    public void sql(BindingSQLContext<String> ctx) 
    throws SQLException {
        ctx.render().visit(inline(ctx.value()));
    }

    @Override
    public void register(BindingRegisterContext<String> ctx) 
    throws SQLException {
        delegate.register(ctx);
    }

    // No need to set anything
    @Override
    public void set(BindingSetStatementContext<String> ctx) 
    throws SQLException {}

    @Override
    public void set(BindingSetSQLOutputContext<String> ctx) 
    throws SQLException {
        delegate.set(ctx);
    }

    @Override
    public void get(BindingGetResultSetContext<String> ctx) 
    throws SQLException {
        delegate.get(ctx);
    }

    @Override
    public void get(BindingGetStatementContext<String> ctx) 
    throws SQLException {
        delegate.get(ctx);
    }

    @Override
    public void get(BindingGetSQLInputContext<String> ctx) 
    throws SQLException {
        delegate.get(ctx);
    }
}

Or, starting from jOOQ 3.15, even simpler, and generic:

class AlwaysInlineStringBinding 
extends DefaultBinding<String, String> {
    public AlwaysInlineStringBinding() {
        super(DefaultBinding.binding(VARCHAR));
    }

    @Override
    public void sql(BindingSQLContext<String> ctx) 
    throws SQLException {
        ctx.render().visit(inline(ctx.value()));
    }

    // No need to set anything
    @Override
    public void set(BindingSetStatementContext<T> ctx) 
    throws SQLException {}
}

Or even generic:

class AlwaysInlineBinding<T> extends DefaultBinding<T, T> {
    public AlwaysInlineBinding(DataType<T> type) {
        super(DefaultBinding.binding(type));
    }

    @Override
    public void sql(BindingSQLContext<T> ctx) 
    throws SQLException {
        ctx.render().visit(inline(ctx.value()));
    }

    // No need to set anything
    @Override
    public void set(BindingSetStatementContext<T> ctx) 
    throws SQLException {}
}

All this does is generate inline values instead of the ? bind parameter marker, and skip setting any value to the JDBC PreparedStatement (or reactive R2DBC Statement, starting from jOOQ 3.15. This will work just the same!)

Try it very easily, yourself (using the jOOQ 3.15 version):

@Test
public void testAlwaysInlineBinding() {
    DSLContext ctx = DSL.using(DEFAULT);
    DataType<Integer> t = INTEGER.asConvertedDataType(
        new AlwaysInlineBinding<>(INTEGER));

    Field<Integer> i = field("i", INTEGER);
    Field<Integer> j = field("j", t);
    Param<Integer> a = val(1);
    Param<Integer> b = val(1, INTEGER.asConvertedDataType(
        new AlwaysInlineBinding<>(INTEGER)));

    // Bind value by default
    assertEquals("?", ctx.render(a));
    assertEquals("1", ctx.renderInlined(a));
    assertEquals("1", ctx.render(b));
    assertEquals("1", ctx.renderInlined(b));

    // Bind value by default in predicates
    assertEquals("i = ?", ctx.render(i.eq(a)));
    assertEquals("i = 1", ctx.renderInlined(i.eq(a)));
    assertEquals("i = 1", ctx.render(i.eq(b)));
    assertEquals("i = 1", ctx.renderInlined(i.eq(b)));
    assertEquals("i = ?", ctx.render(i.eq(1)));
    assertEquals("i = 1", ctx.renderInlined(i.eq(1)));

    // No more bind values in predicates!
    assertEquals("j = 1", ctx.render(j.eq(a)));
    assertEquals("j = 1", ctx.renderInlined(j.eq(a)));
    assertEquals("j = 1", ctx.render(j.eq(b)));
    assertEquals("j = 1", ctx.renderInlined(j.eq(b)));
    assertEquals("j = 1", ctx.render(j.eq(1)));
    assertEquals("j = 1", ctx.renderInlined(j.eq(1)));
}

Of course, instead of doing the above programmatically, you’ll attach this Binding to all relevant columns using the code generator’s forced type configuration

Conclusion

Please use bind values by default. Both in jOOQ and elsewhere. It’s a very good default.

But sometimes, your data is skewed, and you as a developer, you probably know that. In those cases, sometimes, “inline values” as we call them (or constants, literals, etc.) may be the better choice to help the optimiser get better at estimating. Even if the optimiser’s estimates are good the first time, the plan may switch in production for some weird reason, including some plans being purged because the cache is full, or the DBA clicked a button, or whatever.

And that’s when your query might be unnecessarily slow all of a sudden. No more need. When you have enum types, or similar, just use the above simple trick, apply to all of your schema where it makes sense, and forget this problem for good.

Side note:

Of course, the other way round is just as easy. When you have inline literals that you want to switch over to bind values, you can do it just the same way, e.g. when you use jOOQ’s parsing connection to translate between dialects, or to patch your wrong ORM-generated SQL!



from Java, SQL and jOOQ. https://ift.tt/3g3usGA
via IFTTT

No comments:

Post a Comment