Re: [HACKERS] Solving hash table overrun problems

2005-03-06 Thread Aaron Birkland
This also brings up a line of thought I had a while ago on a related topic. Something like a HashDistinct might be useful, if it had no startup cost. We already have that: the planner will use a HashAgg node in this fashion in some contexts (I think just as one of the ways to do IN,

Re: [HACKERS] Solving hash table overrun problems

2005-03-04 Thread Tom Lane
Aaron Birkland [EMAIL PROTECTED] writes: This also brings up a line of thought I had a while ago on a related topic. Something like a HashDistinct might be useful, if it had no startup cost. It would basically be a plan node in the executor that would dynamically build a hashtable so that it

Re: [HACKERS] Solving hash table overrun problems

2005-03-04 Thread Bruno Wolff III
On Thu, Mar 03, 2005 at 17:05:40 -0500, Tom Lane [EMAIL PROTECTED] wrote: * Estimate the number of batches N using the planner's estimate. We will always choose N a power of 2. A tuple's batch number is ((H div K) mod N). If K is way low this could be very slow. Is there a way to do

Re: [HACKERS] Solving hash table overrun problems

2005-03-04 Thread Tom Lane
Bruno Wolff III [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] wrote: * Estimate the number of batches N using the planner's estimate. We will always choose N a power of 2. A tuple's batch number is ((H div K) mod N). If K is way low this could be very slow. How so? You're not

Re: [HACKERS] Solving hash table overrun problems

2005-03-04 Thread Bruno Wolff III
On Fri, Mar 04, 2005 at 10:42:08 -0500, Tom Lane [EMAIL PROTECTED] wrote: Bruno Wolff III [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] wrote: * Estimate the number of batches N using the planner's estimate. We will always choose N a power of 2. A tuple's batch number is ((H

Re: [HACKERS] Solving hash table overrun problems

2005-03-04 Thread Tom Lane
Bruno Wolff III [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] wrote: Bruno Wolff III [EMAIL PROTECTED] writes: If K is way low this could be very slow. How so? You're not concerned about the time to do the division itself are you? No, rather having lots of entries in the same hash

[HACKERS] Solving hash table overrun problems

2005-03-03 Thread Tom Lane
We saw a case recently where a hash join was using much more memory than it was supposed to, causing failure when the server ran out of memory. The hash join code is supposed to spill tuples to disk when the hashtable exceeds work_mem, but this failed to save us because the algorithm is not

Re: [HACKERS] Solving hash table overrun problems

2005-03-03 Thread Aaron Birkland
We saw a case recently where a hash join was using much more memory than it was supposed to, causing failure when the server ran out of memory. Yes. I had the same problem a few month ago, http://archives.postgresql.org/pgsql-general/2004-09/msg00410.php It turned out that the cost estimates