Hello hackers, I don't like this:
-> Parallel Hash (... rows=416667 ...) (... rows=333333 ...) I think the logic in get_parallel_divisor() only makes sense for queries like this (output with nearby patch to show leader contribution): postgres=# set parallel_tuple_cost = 0; SET postgres=# set parallel_setup_cost = 0; SET postgres=# explain (analyze, verbose) select * from t; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Gather (cost=0.00..8591.67 rows=1000000 width=4) (actual time=0.460..411.944 rows=1000000 loops=1) Output: i Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on public.t (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.066..136.777 rows=333333 loops=3) Output: i Leader: actual time=0.022..8.061 rows=30058 loops=1 <--- poor contribution Worker 0: actual time=0.150..201.007 rows=502574 loops=1 Worker 1: actual time=0.027..201.263 rows=467368 loops=1 Planning Time: 0.071 ms Execution Time: 495.700 ms (11 rows) For anything that consumes its entire input before emitting a tuple, for example Partial Aggregate, Sort and Hash, it doesn't make sense to mangle the child paths' row estimates. For example: -> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=263.402..263.403 rows=333333 loops=3) Output: t2.i Buckets: 131072 Batches: 16 Memory Usage: 3520kB Leader: actual time=266.861..266.861 rows=341888 loops=1 <--- equal contribution Worker 0: actual time=261.521..261.522 rows=322276 loops=1 Worker 1: actual time=261.824..261.825 rows=335836 loops=1 get_parallel_divisor() effectively assumes that every tuple emitted by the path will cause a tuple to reach the Gather node, at which point the gather distraction quotient needs to be estimated, so it does that up front. Whether that really happens depends on where the path finishes up being used. I wonder if it would be better to get rid of that logic completely, and instead tweak the gather path's run cost to account for the processing asymmetry. In cases where the leader contributes very little, you could argue that it's not OK to ignore leader distraction in child paths (and therefore use avg(cardinality), not max(cardinality) over all processes in, say, a nestloop), but on the other hand, for cases that use if for something important like estimating memory consumption (hash, sort, agg) that's exactly what you want anyway because they're greedy. With this approach, I suspect gather merge doesn't need to do anything at all, because there we expect the leader to be forced to contribute equally by the ordering requirement. Here's an experimental patch to show what I mean. Not tested much, just trying out the idea.
tweak-partial-cardinality.patch
Description: Binary data