Re: [HACKERS] why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller

2014-06-10 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote: > If we could allow NTUP_PER_BUCKET to drop when the hashtable is > expected to fit in memory either way, perhaps with some safety margin > (e.g. we expect to use less than 75% of work_mem), I bet that would > make the people who have been complaining ab

Re: [HACKERS] why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller

2014-06-10 Thread Robert Haas
On Tue, Jun 10, 2014 at 10:46 AM, Tom Lane wrote: > Robert Haas writes: >> On Tue, Jun 10, 2014 at 10:27 AM, Tom Lane wrote: >>> I don't really recall any hard numbers being provided. I think if we >>> looked at some results that said "here's the average gain, and here's >>> the worst-case loss

Re: [HACKERS] why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller

2014-06-10 Thread Tom Lane
Robert Haas writes: > On Tue, Jun 10, 2014 at 10:27 AM, Tom Lane wrote: >> I don't really recall any hard numbers being provided. I think if we >> looked at some results that said "here's the average gain, and here's >> the worst-case loss, and here's an estimate of how often you'd hit >> the wo

Re: [HACKERS] why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller

2014-06-10 Thread Robert Haas
On Tue, Jun 10, 2014 at 10:27 AM, Tom Lane wrote: >> Well, it's all in what scenario you test, right? If you test the case >> where something overflows work_mem as a result of the increased size >> of the bucket array, it's always going to suck. And if you test the >> case where that doesn't hap

Re: [HACKERS] why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller

2014-06-10 Thread Tom Lane
Robert Haas writes: > On Mon, Jun 9, 2014 at 11:09 AM, Tom Lane wrote: >> I'm quite prepared to believe that we should change NTUP_PER_BUCKET ... >> but appealing to standard advice isn't a good basis for arguing that. >> Actual performance measurements (in both batched and unbatched cases) >> wo

Re: [HACKERS] why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller

2014-06-10 Thread Robert Haas
On Mon, Jun 9, 2014 at 11:09 AM, Tom Lane wrote: > Keep in mind that that standard advice is meant for all-in-memory cases, > not for cases where the alternative to running with longer hash chains > is dumping tuples out to disk and reading them back. Sure, but that doesn't help someone who sets

Re: [HACKERS] why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller

2014-06-09 Thread Tom Lane
Robert Haas writes: > This has come up before. Basically, the problem is that if you reduce > NTUP_PER_BUCKET, the bucket array gets larger, which might reduce the > amount of space available for tuples to the point where the hash join > overflows to multiple batches. That will be more expensive

Re: [HACKERS] why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller

2014-06-09 Thread Robert Haas
On Mon, Jun 9, 2014 at 4:06 AM, b8flowerfire wrote: > When I read the source code about the hashjoin, I was very confused that the > postgresql define the NTUP_PER_BUCKET value as 10. > Since this value is used to estimate the tuple count in one bucket, is it > better if we have a smaller value? >

[HACKERS] why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller

2014-06-09 Thread b8flowerfire
When I read the source code about the hashjoin, I was very confused that the postgresql define the NTUP_PER_BUCKET value as 10. Since this value is used to estimate the tuple count in one bucket, is it better if we have a smaller value? I have not done some experiments, but it seems that we could a