* Robert Haas (robertmh...@gmail.com) wrote:
> If we could allow NTUP_PER_BUCKET to drop when the hashtable is
> expected to fit in memory either way, perhaps with some safety margin
> (e.g. we expect to use less than 75% of work_mem), I bet that would
> make the people who have been complaining ab
On Tue, Jun 10, 2014 at 10:46 AM, Tom Lane wrote:
> Robert Haas writes:
>> On Tue, Jun 10, 2014 at 10:27 AM, Tom Lane wrote:
>>> I don't really recall any hard numbers being provided. I think if we
>>> looked at some results that said "here's the average gain, and here's
>>> the worst-case loss
Robert Haas writes:
> On Tue, Jun 10, 2014 at 10:27 AM, Tom Lane wrote:
>> I don't really recall any hard numbers being provided. I think if we
>> looked at some results that said "here's the average gain, and here's
>> the worst-case loss, and here's an estimate of how often you'd hit
>> the wo
On Tue, Jun 10, 2014 at 10:27 AM, Tom Lane wrote:
>> Well, it's all in what scenario you test, right? If you test the case
>> where something overflows work_mem as a result of the increased size
>> of the bucket array, it's always going to suck. And if you test the
>> case where that doesn't hap
Robert Haas writes:
> On Mon, Jun 9, 2014 at 11:09 AM, Tom Lane wrote:
>> I'm quite prepared to believe that we should change NTUP_PER_BUCKET ...
>> but appealing to standard advice isn't a good basis for arguing that.
>> Actual performance measurements (in both batched and unbatched cases)
>> wo
On Mon, Jun 9, 2014 at 11:09 AM, Tom Lane wrote:
> Keep in mind that that standard advice is meant for all-in-memory cases,
> not for cases where the alternative to running with longer hash chains
> is dumping tuples out to disk and reading them back.
Sure, but that doesn't help someone who sets
Robert Haas writes:
> This has come up before. Basically, the problem is that if you reduce
> NTUP_PER_BUCKET, the bucket array gets larger, which might reduce the
> amount of space available for tuples to the point where the hash join
> overflows to multiple batches. That will be more expensive
On Mon, Jun 9, 2014 at 4:06 AM, b8flowerfire wrote:
> When I read the source code about the hashjoin, I was very confused that the
> postgresql define the NTUP_PER_BUCKET value as 10.
> Since this value is used to estimate the tuple count in one bucket, is it
> better if we have a smaller value?
>
When I read the source code about the hashjoin, I was very confused that the
postgresql define the NTUP_PER_BUCKET value as 10.
Since this value is used to estimate the tuple count in one bucket, is it
better if we have a smaller value?
I have not done some experiments, but it seems that we could a