Re: fix cost subqueryscan wrong parallel cost

2022-05-04 Thread Tom Lane
Robert Haas writes: > On Tue, May 3, 2022 at 2:13 PM Tom Lane wrote: >> In any case, fundamental redesign of what EXPLAIN prints is a job >> for v16 or later. Are you okay with the proposed patch as a v15 fix? > Yes. I can't really vouch for it, but I don't object to it. I re-read the patch

Re: fix cost subqueryscan wrong parallel cost

2022-05-03 Thread Robert Haas
On Tue, May 3, 2022 at 2:13 PM Tom Lane wrote: > In any case, fundamental redesign of what EXPLAIN prints is a job > for v16 or later. Are you okay with the proposed patch as a v15 fix? Yes. I can't really vouch for it, but I don't object to it. -- Robert Haas EDB: http://www.enterprisedb.com

Re: fix cost subqueryscan wrong parallel cost

2022-05-03 Thread Tom Lane
Robert Haas writes: > That I don't like at all. I'm still of the opinion that it's a huge > mistake for EXPLAIN to print int(rowcount/loops) instead of just > rowcount. The division is never what I want and in my experience is > also not what other people want and often causes confusion. Both the

Re: fix cost subqueryscan wrong parallel cost

2022-05-03 Thread Robert Haas
On Mon, May 2, 2022 at 5:24 PM Tom Lane wrote: > I did look at the rest of costsize.c for similar instances, and didn't > find any. In any case, I think we have two options: > > 1. Apply this fix, and in future fix any other places that we identify > later. > > 2. Invent some entirely new scheme

Re: fix cost subqueryscan wrong parallel cost

2022-05-02 Thread Tom Lane
Robert Haas writes: > I am not sure whether this is actually correct, but it seems a lot > more believable than the previous proposals. The problem might be more > general, though. I think when I developed this parallel query stuff I > modeled a lot of it on what you did for parameterized paths.

Re: fix cost subqueryscan wrong parallel cost

2022-05-02 Thread Robert Haas
On Fri, Apr 29, 2022 at 3:38 PM Tom Lane wrote: > I wrote: > > So perhaps we should do it more like the attached, which produces > > this plan for the UNION case: > > sigh ... actually attached this time. I am not sure whether this is actually correct, but it seems a lot more believable than the

Re: fix cost subqueryscan wrong parallel cost

2022-04-29 Thread David G. Johnston
On Fri, Apr 29, 2022 at 12:31 PM Tom Lane wrote: > "David G. Johnston" writes: > > The fact that (baserel.rows > path->subpath->rows) here seems like a > > straight bug: there are no filters involved in this case but in the > > presence of filters baserel->rows should be strictly (<= > >

Re: fix cost subqueryscan wrong parallel cost

2022-04-29 Thread Tom Lane
I wrote: > So perhaps we should do it more like the attached, which produces > this plan for the UNION case: sigh ... actually attached this time. regards, tom lane diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index

Re: fix cost subqueryscan wrong parallel cost

2022-04-29 Thread Tom Lane
"David G. Johnston" writes: > The fact that (baserel.rows > path->subpath->rows) here seems like a > straight bug: there are no filters involved in this case but in the > presence of filters baserel->rows should be strictly (<= > path->subpath->rows), right? No, because the subpath's rowcount

Re: fix cost subqueryscan wrong parallel cost

2022-04-29 Thread David G. Johnston
On Fri, Apr 29, 2022 at 11:09 AM Tom Lane wrote: > > In short, these SubqueryScans are being labeled as producing 6 rows > when their input only produces 25000 rows, which is surely insane. > > So: even though the SubqueryScan itself isn't parallel-aware, the number > of rows it processes

Re: fix cost subqueryscan wrong parallel cost

2022-04-29 Thread Tom Lane
I wrote: > So I think the actual problem here is that we leave the decision > to elide no-op SubqueryScan nodes till setrefs.c. We should detect > that earlier, and when it applies, label the SubqueryScanPath with > the exact cost of its child. Hmm ... actually, while doing that seems like it'd

Re: fix cost subqueryscan wrong parallel cost

2022-04-29 Thread Tom Lane
I wrote: > -ENOCAFFEINE, sorry about that. As penance for that blunder, I spent a little time looking into this issue (responding to Robert's upthread complaint that it wasn't explained clearly). See the test case in parallel_subquery.sql, attached, which is adapted from the test in

Re: fix cost subqueryscan wrong parallel cost

2022-04-29 Thread Tom Lane
I wrote: > Really? Maybe I misunderstand the case under consideration, but > what I think will be happening is that each worker will re-execute > the pushed-down subquery in full. Oh ... nevermind that: what I was thinking about was the SubLink/SubPlan case, which is unrelated to SubqueryScan.

Re: fix cost subqueryscan wrong parallel cost

2022-04-29 Thread David G. Johnston
On Fri, Apr 29, 2022 at 7:02 AM Tom Lane wrote: > Richard Guo writes: > > Currently subquery scan is using rel->rows (if no parameterization), > > which I believe is not correct. That's not the size the subquery scan > > node in each worker needs to handle, as the rows have been divided > >

Re: fix cost subqueryscan wrong parallel cost

2022-04-29 Thread Tom Lane
Richard Guo writes: > Currently subquery scan is using rel->rows (if no parameterization), > which I believe is not correct. That's not the size the subquery scan > node in each worker needs to handle, as the rows have been divided > across workers by the parallel-aware node. Really? Maybe I

Re: Re: fix cost subqueryscan wrong parallel cost

2022-04-29 Thread Richard Guo
On Fri, Apr 29, 2022 at 12:53 AM Robert Haas wrote: > Gather doesn't require a parallel aware subpath, just a parallel-safe > subpath. In a case like this, the parallel seq scan will divide the > rows from the underlying relation across the three processes executing > it. Each process will pass

Re: Re: fix cost subqueryscan wrong parallel cost

2022-04-28 Thread David G. Johnston
On Thu, Apr 28, 2022 at 9:53 AM Robert Haas wrote: > On Fri, Apr 22, 2022 at 11:55 AM David G. Johnston > wrote: > > On Wed, Apr 20, 2022 at 11:38 PM bu...@sohu.com wrote: > >> > >> > > for now fuction cost_subqueryscan always using *total* rows even > parallel > >> > > path. like this: > >> >

Re: Re: fix cost subqueryscan wrong parallel cost

2022-04-28 Thread Robert Haas
On Fri, Apr 22, 2022 at 11:55 AM David G. Johnston wrote: > On Wed, Apr 20, 2022 at 11:38 PM bu...@sohu.com wrote: >> >> > > for now fuction cost_subqueryscan always using *total* rows even parallel >> > > path. like this: >> > > >> > > Gather (rows=3) >> > > Workers Planned: 2 >> > > ->

Re: Re: fix cost subqueryscan wrong parallel cost

2022-04-22 Thread David G. Johnston
On Wed, Apr 20, 2022 at 11:38 PM bu...@sohu.com wrote: > > > for now fuction cost_subqueryscan always using *total* rows even > parallel > > > path. like this: > > > > > > Gather (rows=3) > > > Workers Planned: 2 > > > -> Subquery Scan (rows=3) -- *total* rows, should be equal >

Re: fix cost subqueryscan wrong parallel cost

2022-04-21 Thread bucoo
> > > Suppose parallelism is not in use and that param_info is NULL. Then, > > > is path->subpath->rows guaranteed to be equal to baserel->rows? If > > > yes, then we don't need to a three-part if statement as you propose > > > here and can just change the "else" clause to say path->path.rows = >

Re: Re: fix cost subqueryscan wrong parallel cost

2022-04-21 Thread Robert Haas
On Thu, Apr 21, 2022 at 2:38 AM bu...@sohu.com wrote: > > Suppose parallelism is not in use and that param_info is NULL. Then, > > is path->subpath->rows guaranteed to be equal to baserel->rows? If > > yes, then we don't need to a three-part if statement as you propose > > here and can just

Re: Re: fix cost subqueryscan wrong parallel cost

2022-04-21 Thread bu...@sohu.com
> > for now fuction cost_subqueryscan always using *total* rows even parallel > > path. like this: > > > > Gather (rows=3) > > Workers Planned: 2 > > -> Subquery Scan (rows=3) -- *total* rows, should be equal subpath > > -> Parallel Seq Scan (rows=1) > > OK, that's

Re: Re: fix cost subqueryscan wrong parallel cost

2022-04-20 Thread Robert Haas
On Wed, Apr 20, 2022 at 10:01 AM bu...@sohu.com wrote: > for now fuction cost_subqueryscan always using *total* rows even parallel > path. like this: > > Gather (rows=3) > Workers Planned: 2 > -> Subquery Scan (rows=3) -- *total* rows, should be equal subpath > -> Parallel

Re: Re: fix cost subqueryscan wrong parallel cost

2022-04-20 Thread bu...@sohu.com
> Sure, but that doesn't make the patch correct. The patch proposes > that, when parallelism in use, a subquery scan will produce fewer rows > than when parallelism is not in use, and that's 100% false. Compare > this with the case of a parallel sequential scan. If a table contains > 1000 rows,

Re: Re: fix cost subqueryscan wrong parallel cost

2022-04-18 Thread Robert Haas
On Fri, Apr 15, 2022 at 6:06 AM bu...@sohu.com wrote: > > Generally it should be. But there's no subquery scan visible here. > I wrote a patch for distinct/union and aggregate support last year(I want > restart it again). > https://www.postgresql.org/message-id/2021091517250848215321%40sohu.com

Re: fix cost subqueryscan wrong parallel cost

2022-04-15 Thread Justin Pryzby
On Fri, Apr 15, 2022 at 05:16:44PM +0800, Richard Guo wrote: > Not related to this topic but I noticed another problem from the plan. > Note the first Sort node which is to unique-ify the result of the UNION. > Why cannot we re-arrange the sort keys from (a, b, c) to (a, c, b) so > that we can

Re: Re: fix cost subqueryscan wrong parallel cost

2022-04-15 Thread bu...@sohu.com
> Generally it should be. But there's no subquery scan visible here. I wrote a patch for distinct/union and aggregate support last year(I want restart it again). https://www.postgresql.org/message-id/2021091517250848215321%40sohu.com If not apply this patch, some parallel paths will naver be

Re: fix cost subqueryscan wrong parallel cost

2022-04-15 Thread Richard Guo
On Fri, Apr 15, 2022 at 12:50 AM Robert Haas wrote: > On Tue, Apr 12, 2022 at 2:57 AM bu...@sohu.com wrote: > > The cost_subqueryscan function does not judge whether it is parallel. > > I don't see any reason why it would need to do that. A subquery scan > isn't parallel aware. > > > regress >

Re: fix cost subqueryscan wrong parallel cost

2022-04-14 Thread Robert Haas
On Tue, Apr 12, 2022 at 2:57 AM bu...@sohu.com wrote: > The cost_subqueryscan function does not judge whether it is parallel. I don't see any reason why it would need to do that. A subquery scan isn't parallel aware. > regress > -- Incremental sort vs. set operations with varno 0 > set

fix cost subqueryscan wrong parallel cost

2022-04-12 Thread bu...@sohu.com
The cost_subqueryscan function does not judge whether it is parallel. regress -- Incremental sort vs. set operations with varno 0 set enable_hashagg to off; explain (costs off) select * from t union select * from t order by 1,3; QUERY PLAN