Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-28 Thread Jeff Janes
On Mon, Jan 27, 2014 at 10:00 AM, Simon Riggs wrote: > On 27 January 2014 17:44, Pavel Stehule wrote: > > > This topic is interesting - we found very bad performance with hashing > large > > tables with high work_mem. MergeJoin with quicksort was significantly > > faster. > > I've seen this also

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-28 Thread Jeremy Harris
On 27/01/14 18:00, Simon Riggs wrote: On 27 January 2014 17:44, Pavel Stehule wrote: This topic is interesting - we found very bad performance with hashing large tables with high work_mem. MergeJoin with quicksort was significantly faster. I've seen this also. I didn't deeper research - th

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-27 Thread Simon Riggs
On 27 January 2014 17:44, Pavel Stehule wrote: > This topic is interesting - we found very bad performance with hashing large > tables with high work_mem. MergeJoin with quicksort was significantly > faster. I've seen this also. > I didn't deeper research - there is a possibility of virtualizat

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-27 Thread Pavel Stehule
2014-01-27 Stephen Frost > * Simon Riggs (si...@2ndquadrant.com) wrote: > > I don't see anything for 9.4 in here now. > > Attached is what I was toying with (thought I had attached it previously > somewhere.. perhaps not), but in re-testing, it doesn't appear to do > enough to move things in the

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-27 Thread Stephen Frost
* Stephen Frost (sfr...@snowman.net) wrote: > * Simon Riggs (si...@2ndquadrant.com) wrote: > > I don't see anything for 9.4 in here now. > > Attached [...] I'm apparently bad at this 'attaching' thing, particularly on this subject. Here it is. Thanks, Stephen colordiff

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-27 Thread Stephen Frost
* Simon Riggs (si...@2ndquadrant.com) wrote: > I don't see anything for 9.4 in here now. Attached is what I was toying with (thought I had attached it previously somewhere.. perhaps not), but in re-testing, it doesn't appear to do enough to move things in the right direction in all cases. I did

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-27 Thread Simon Riggs
On 25 January 2014 23:08, Simon Riggs wrote: > On 25 January 2014 22:33, Stephen Frost wrote: > >> * Tom Lane (t...@sss.pgh.pa.us) wrote: > >>> AFAICT, there was no consensus in this thread on what to do, which >>> probably has something to do with the lack of concrete performance >>> tests prese

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-26 Thread Atri Sharma
Sent from my iPad > On 26-Jan-2014, at 4:38, Simon Riggs wrote: > >> On 25 January 2014 22:33, Stephen Frost wrote: >> >> * Tom Lane (t...@sss.pgh.pa.us) wrote: > >>> AFAICT, there was no consensus in this thread on what to do, which >>> probably has something to do with the lack of concret

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-25 Thread Simon Riggs
On 25 January 2014 22:33, Stephen Frost wrote: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> AFAICT, there was no consensus in this thread on what to do, which >> probably has something to do with the lack of concrete performance >> tests presented to back up any particular proposal. > > This I en

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-25 Thread Stephen Frost
Tom, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > In the end, I believe we absolutely should do something about this. > > Hashing a 64M-row table (requiring upwards of 8G) instead of hashing > > a 2M-row table is really bad of us. > > Huh? I don't see anything in the threa

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-25 Thread Tom Lane
Stephen Frost writes: > * Bruce Momjian (br...@momjian.us) wrote: >> Uh, were are we on this? Is it a TODO? > I've been strongly considering my previous patch which tweaked > NTUP_PER_BUCKET to '1' (instead of the default '10') when there's > sufficient work_mem for it. There was recently anoth

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-25 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote: > Uh, were are we on this? Is it a TODO? I've been strongly considering my previous patch which tweaked NTUP_PER_BUCKET to '1' (instead of the default '10') when there's sufficient work_mem for it. There was recently another complaint on IRC about our te

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-25 Thread Bruce Momjian
Uh, were are we on this? Is it a TODO? --- On Wed, Jul 3, 2013 at 01:39:28PM +0530, Atri Sharma wrote: > Hi all, > > I have been working on a patch for the above discussed > functionalities. I made an array of int32s, one

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-07-03 Thread Atri Sharma
Hi all, I have been working on a patch for the above discussed functionalities. I made an array of int32s, one for each bucket in a hash table(the array is per hash table). I am using the hash value directly to set the corresponding bit in the bit field.Specifically: bitfield_orvalue = 1

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-26 Thread Jeff Janes
On Sun, Jun 23, 2013 at 2:45 AM, Simon Riggs wrote: > On 23 June 2013 03:16, Stephen Frost wrote: > > > Will think on it more. > > Some other thoughts related to this... > > * Why are we building a special kind of hash table? Why don't we just > use the hash table code that we in every other pla

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-26 Thread Atri Sharma
> > Isn't it currently the last week of this month? :) I'm guessing you > mean July. Heh,no.I lose track of time these days. Alright, second week of July then. > I really don't see that happening, to be honest.. I think it would be > interesting to try some of the surrogate-additional-hashing

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-26 Thread Stephen Frost
* Atri Sharma (atri.j...@gmail.com) wrote: > Right, let me look.Although, I am pretty busy atm with ordered set > functions, so will get it done maybe last week of this month. Isn't it currently the last week of this month? :) I'm guessing you mean July. > Another thing I believe in is that we s

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-26 Thread Atri Sharma
On Wed, Jun 26, 2013 at 9:20 PM, Stephen Frost wrote: > * Atri Sharma (atri.j...@gmail.com) wrote: >> My point is that I would like to help in the implementation, if possible. :) > > Feel free to go ahead and implement it.. I'm not sure when I'll have a > chance to (probably not in the next week

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-26 Thread Stephen Frost
* Atri Sharma (atri.j...@gmail.com) wrote: > My point is that I would like to help in the implementation, if possible. :) Feel free to go ahead and implement it.. I'm not sure when I'll have a chance to (probably not in the next week or two anyway). Unfortunately, the bigger issue here is really

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-26 Thread Atri Sharma
On Wed, Jun 26, 2013 at 7:12 PM, Stephen Frost wrote: > Atri, > > * Atri Sharma (atri.j...@gmail.com) wrote: >> I just popped in here on Simon's advice to put an idea I had about >> optimizing hash joins on this thread. > > I'd encourage reading the thread a bit first, in the future.. :) > Yeah, I

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-26 Thread Stephen Frost
Atri, * Atri Sharma (atri.j...@gmail.com) wrote: > I just popped in here on Simon's advice to put an idea I had about > optimizing hash joins on this thread. I'd encourage reading the thread a bit first, in the future.. :) > Essentially, I was thinking of using bloom filters in the part where >

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-26 Thread Atri Sharma
On Sun, Jun 23, 2013 at 8:41 PM, Heikki Linnakangas wrote: > On 23.06.2013 01:48, Simon Riggs wrote: >> >> On 22 June 2013 21:40, Stephen Frost wrote: >> >>> I'm actually not a huge fan of this as it's certainly not cheap to do. If >>> it >>> can be shown to be better than an improved heuristic t

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-23 Thread Heikki Linnakangas
On 23.06.2013 01:48, Simon Riggs wrote: On 22 June 2013 21:40, Stephen Frost wrote: I'm actually not a huge fan of this as it's certainly not cheap to do. If it can be shown to be better than an improved heuristic then perhaps it would work but I'm not convinced. We need two heuristics, it w

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-23 Thread Stephen Frost
On Sunday, June 23, 2013, Simon Riggs wrote: > On 23 June 2013 03:16, Stephen Frost > > wrote: > > > Will think on it more. > > Some other thoughts related to this... > > * Why are we building a special kind of hash table? Why don't we just > use the hash table code that we in every other place in

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-23 Thread Stephen Frost
On Sunday, June 23, 2013, Simon Riggs wrote: > On 23 June 2013 03:16, Stephen Frost > > wrote: > > > Still doesn't really address the issue of dups though. > > Checking for duplicates in all cases would be wasteful, since often we > are joining to the PK of a smaller table. Well, that's what nd

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-23 Thread Simon Riggs
On 23 June 2013 03:16, Stephen Frost wrote: > Will think on it more. Some other thoughts related to this... * Why are we building a special kind of hash table? Why don't we just use the hash table code that we in every other place in the backend. If that code is so bad why do we use it everywhe

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-22 Thread Simon Riggs
On 23 June 2013 03:16, Stephen Frost wrote: > Still doesn't really address the issue of dups though. Checking for duplicates in all cases would be wasteful, since often we are joining to the PK of a smaller table. If duplicates are possible at all for a join, then it would make sense to build

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-22 Thread Stephen Frost
On Saturday, June 22, 2013, Simon Riggs wrote: > On 22 June 2013 21:40, Stephen Frost > > wrote: > > > I'm actually not a huge fan of this as it's certainly not cheap to do. > If it > > can be shown to be better than an improved heuristic then perhaps it > would > > work but I'm not convinced. > >

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-22 Thread Stephen Frost
On Saturday, June 22, 2013, Heikki Linnakangas wrote: > On 22.06.2013 19:19, Simon Riggs wrote: > >> So I think that (2) is the best route: Given that we know with much >> better certainty the number of rows in the scanned-relation, we should >> be able to examine our hash table after it has been

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-22 Thread Simon Riggs
On 22 June 2013 21:40, Stephen Frost wrote: > I'm actually not a huge fan of this as it's certainly not cheap to do. If it > can be shown to be better than an improved heuristic then perhaps it would > work but I'm not convinced. We need two heuristics, it would seem: * an initial heuristic to

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-22 Thread Heikki Linnakangas
On 22.06.2013 19:19, Simon Riggs wrote: So I think that (2) is the best route: Given that we know with much better certainty the number of rows in the scanned-relation, we should be able to examine our hash table after it has been built and decide whether it would be cheaper to rebuild the hash t

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-22 Thread Stephen Frost
On Saturday, June 22, 2013, Simon Riggs wrote: > On 22 June 2013 14:48, Stephen Frost > > wrote: > > > Based on your argument that we want to have a bucket load which is, on > > average, the size of NTUP_PER_BUCKET, I have to '-1' on this. > > That is not my argument. I am pointing out that the co

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-22 Thread Robert Haas
On Sat, Jun 22, 2013 at 9:48 AM, Stephen Frost wrote: >> The correct calculation that would match the objective set out in the >> comment would be >> >> dbuckets = (hash_table_bytes / tupsize) / NTUP_PER_BUCKET; > > This looks to be driving the size of the hash table size off of "how > many of th

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-22 Thread Robert Haas
On Sat, Jun 22, 2013 at 9:15 AM, Simon Riggs wrote: > Previous discussions of Hash Joins have noted that the performance > decreases when the average number of tuples per bucket increases. > O(N^2) effects are seen. > > We've argued this about many different ways, yet all of those > discussions ha

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-22 Thread Simon Riggs
On 22 June 2013 14:48, Stephen Frost wrote: > Based on your argument that we want to have a bucket load which is, on > average, the size of NTUP_PER_BUCKET, I have to '-1' on this. That is not my argument. I am pointing out that the comments claim the code does that, and yet the code does not.

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2013-06-22 Thread Stephen Frost
Simon, * Simon Riggs (si...@2ndquadrant.com) wrote: > Previous discussions of Hash Joins have noted that the performance > decreases when the average number of tuples per bucket increases. Having the hash table so small that we have hash bucket collisions with different 32bit hash values is extre