I’ve recently discovered a pleasant new addition to PostgreSQL 14, the new enable_memoize flag that improves the performance of some nested loop joins where statistics hint at this being appropriate. I mean, who can resist this temptation:
Improving query speed by 1000x hints at something very suboptimal having been going on before, and a tool like memoization can be of great help. But will it also help with an “ordinary” join? I wanted to try myself.
What’s memoization?
In a perfect world free of side effects (and SQL is such a perfect world, in theory), memoization means that we can substitute y
for f(x)
in any computation, given that y = f(x)
. For example, no matter how many times you calculate UPPER('x')
, you’ll always get 'X'
. If the calculation of such a function is costly, and there are only few possible input values, then why not just maintain a hash map that maps all previous input values and use that to look up known (or at least frequent) values instead of computing them again?
As I’ve shown previously on this blog, Oracle 11g has introduced a feature called scalar subquery caching, a feature which you can activate in jOOQ to avoid costly PL/SQL context switches.
In the case of PostgreSQL’s enable_memoize
, this can be particularly useful for nested loop joins in SQL, and to reference the above tweet, a lateral join is often executed via a nested loop join.
Turning the feature on and off
I’ve created a schema like this:
CREATE TABLE t AS
SELECT i, i % 5 AS j
FROM generate_series(1, 100000) AS t(i);
CREATE TABLE u AS
SELECT i, i % 20000 as j
FROM generate_series(1, 100000) AS t(i);
CREATE INDEX uj ON u(j);
In summary:
- Both tables
t
andu
have 100000 rows. t.j
has only 5 distinct values, each value appears 20000 times.u.j
has 20000 distinct values, each value appears 5 times.
When running this on PostgreSQL 14:
SELECTcurrent_setting('enable_memoize');
I get:
|current_setting| |---------------| |on |
So, the feature is active, which I can also see in an EXPLAIN
of the following query:
EXPLAIN
SELECT *
FROM t JOIN u ON t.j = u.j;
The plan is:
|QUERY PLAN | |----------------------------------------------------------------------| |Nested Loop (cost=0.30..8945.41 rows=496032 width=16) | | -> Seq Scan on t (cost=0.00..1443.00 rows=100000 width=8) | | -> Memoize (cost=0.30..0.41 rows=5 width=8) | | Cache Key: t.j | | -> Index Scan using uj on u (cost=0.29..0.40 rows=5 width=8)| | Index Cond: (j = t.j) |
Without memoization, when joining the two tables like that, then, for the 100000 rows in t
, I have to look up 100000x the 5 matching rows in u
. But if memoization kicks in, then I will have to perform the lookup only 5 times, because there are only 5 distinct values of t.j
We can play around with execution plans by turning the feature on or off:
SET enable_memoize = ON;
SET enable_memoize = OFF;
When turned off, PostgreSQL seems to choose a hash join or merge join instead, on my machine (between multiple executions, the plan might switch)
|QUERY PLAN | |-------------------------------------------------------------------| |Hash Join (cost=3084.00..11568.51 rows=499351 width=16) | | Hash Cond: (t.j = u.j) | | -> Seq Scan on t (cost=0.00..1443.00 rows=100000 width=8) | | -> Hash (cost=1443.00..1443.00 rows=100000 width=8) | | -> Seq Scan on u (cost=0.00..1443.00 rows=100000 width=8)| |QUERY PLAN | |------------------------------------------------------------------------| |Merge Join (cost=9748.11..763846.11 rows=50000000 width=16) | | Merge Cond: (u.j = t.j) | | -> Index Scan using uj on u (cost=0.29..3848.29 rows=100000 width=8)| | -> Sort (cost=9747.82..9997.82 rows=100000 width=8) | | Sort Key: t.j | | -> Seq Scan on t (cost=0.00..1443.00 rows=100000 width=8) |
Let’s Benchmark
We’re using the usual benchmark technique described here:
- We repeat an operation 25x in mode A and mode B and compare (or more than 25, if it’s a fast operation)
- We repeat the above 5x to mitigate any warmup and other caching effects
You can run the following benchmark on the above schema yourself, to verify:
DO $$
DECLARE
v_ts TIMESTAMP;
v_repeat CONSTANT INT := 25;
rec RECORD;
BEGIN
-- Repeat the whole benchmark several times to avoid warmup penalty
FOR r IN 1..5 LOOP
v_ts := clock_timestamp();
SET enable_memoize = OFF;
FOR i IN 1..v_repeat LOOP
FOR rec IN (
SELECT t.*
FROM t JOIN u ON t.j = u.j
) LOOP
NULL;
END LOOP;
END LOOP;
RAISE INFO 'Run %, Statement 1: %', r, (clock_timestamp() - v_ts);
v_ts := clock_timestamp();
SET enable_memoize = ON;
FOR i IN 1..v_repeat LOOP
FOR rec IN (
SELECT t.*
FROM t JOIN u ON t.j = u.j
) LOOP
NULL;
END LOOP;
END LOOP;
RAISE INFO 'Run %, Statement 2: %', r, (clock_timestamp() - v_ts);
RAISE INFO '';
END LOOP;
END$$;
On my machine, the results are consistent, decent, not too impressive, but still significant:
Run 1, Statement 1: 00:00:03.763426 Run 1, Statement 2: 00:00:03.401346 Run 2, Statement 1: 00:00:03.769419 Run 2, Statement 2: 00:00:03.375677 Run 3, Statement 1: 00:00:03.771465 Run 3, Statement 2: 00:00:03.374413 Run 4, Statement 1: 00:00:03.769136 Run 4, Statement 2: 00:00:03.398734 Run 5, Statement 1: 00:00:03.772544 Run 5, Statement 2: 00:00:03.375272
I.e. a 10% speedup. Across the whole system, that alone would already be worth it.
Optimising LATERAL
Let’s try optimising LATERAL
instead. We could run a query like this:
SELECT *
FROM
t,
LATERAL (
SELECT count(*)
FROM u
WHERE t.j = u.j
) AS u(j)
The EXPLAIN
of the above is
|QUERY PLAN | |---------------------------------------------------------------------------------| |Nested Loop (cost=4.40..3969.47 rows=100000 width=16) | | -> Seq Scan on t (cost=0.00..1443.00 rows=100000 width=8) | | -> Memoize (cost=4.40..4.42 rows=1 width=8) | | Cache Key: t.j | | -> Aggregate (cost=4.39..4.40 rows=1 width=8) | | -> Index Only Scan using uj on u (cost=0.29..4.38 rows=5 width=0)| | Index Cond: (j = t.j) |
So, we can again cache the computation of the COUNT(*)
value for each of the 5 distinct t.j
input values, rather than re-calculating this every time. Surely, this must be even better than before?
Benchmark time!
DO $$
DECLARE
v_ts TIMESTAMP;
v_repeat CONSTANT INT := 25;
rec RECORD;
BEGIN
-- Repeat the whole benchmark several times to avoid warmup penalty
FOR r IN 1..5 LOOP
v_ts := clock_timestamp();
SET enable_memoize = OFF;
FOR i IN 1..v_repeat LOOP
FOR rec IN (
SELECT *
FROM
t,
LATERAL (
SELECT count(*)
FROM u
WHERE t.j = u.j
) AS u(j)
) LOOP
NULL;
END LOOP;
END LOOP;
RAISE INFO 'Run %, Statement 1: %', r, (clock_timestamp() - v_ts);
v_ts := clock_timestamp();
SET enable_memoize = ON;
FOR i IN 1..v_repeat LOOP
FOR rec IN (
SELECT *
FROM
t,
LATERAL (
SELECT count(*)
FROM u
WHERE t.j = u.j
) AS u(j)
) LOOP
NULL;
END LOOP;
END LOOP;
RAISE INFO 'Run %, Statement 2: %', r, (clock_timestamp() - v_ts);
RAISE INFO '';
END LOOP;
END$$;
And this time, we can see a significant speedup!
Run 1, Statement 1: 00:00:03.419728 Run 1, Statement 2: 00:00:01.083941 Run 2, Statement 1: 00:00:03.404954 Run 2, Statement 2: 00:00:01.098404 Run 3, Statement 1: 00:00:03.425725 Run 3, Statement 2: 00:00:01.093883 Run 4, Statement 1: 00:00:03.441691 Run 4, Statement 2: 00:00:01.127837 Run 5, Statement 1: 00:00:03.420172 Run 5, Statement 2: 00:00:01.097943
That’s great news! Wait, does this work also for ordinary correlated subqueries? Because the above LATERAL
correlated subquery could be rewritten as:
SELECT
t.*,
(
SELECT count(*)
FROM u
WHERE t.j = u.j
) j
FROM t;
Regrettably, the plan doesn’t show memoization:
|QUERY PLAN | |-----------------------------------------------------------------------------| |Seq Scan on t (cost=0.00..441693.00 rows=100000 width=16) | | SubPlan 1 | | -> Aggregate (cost=4.39..4.40 rows=1 width=8) | | -> Index Only Scan using uj on u (cost=0.29..4.38 rows=5 width=0)| | Index Cond: (j = t.j) |
And the benchmark (you can paste the query into the benchmark logic yourself to reproduce) confirms there’s no memoization effect
Run 1, Statement 1: 00:00:03.617562 Run 1, Statement 2: 00:00:03.605765 Run 2, Statement 1: 00:00:03.610084 Run 2, Statement 2: 00:00:03.682064 Run 3, Statement 1: 00:00:03.725952 Run 3, Statement 2: 00:00:03.705622 Run 4, Statement 1: 00:00:03.672669 Run 4, Statement 2: 00:00:03.644612 Run 5, Statement 1: 00:00:03.645741 Run 5, Statement 2: 00:00:03.642717
It seems that with this new feature, correlated subqueries could be rewritten to nested loop outer joins in the future? Other optimisers already do this, and we would have effectively the same feature here as Oracle’s scalar subquery caching.
Conclusion
The feature is turned on in PostgreSQL 14. Apart from some additional memory consumption, which might be a small problem if the optimiser is wrong and statistics are off, I don’t see any drawback of this new feature. SQL is a side-effect free (in theory) 4GL, meaning the optimiser can replace any computation by a cached value that depends only on the computation’s input values.
A correlated subquery is a function, whose input parameters are the predicates and other references to the outer query’s columns. As such, the result of a correlated subquery can be cached, or memoized. As shown above, this has drastic effects on your present day SQL queries, from whcih you can profit just by upgrading to PostgreSQL 14.
from Java, SQL and jOOQ. https://ift.tt/3bNVK1K
via IFTTT
No comments:
Post a Comment