Re: [HACKERS] Combining hash values

2016-08-01 Thread Tom Lane
Robert Haas writes: > On Mon, Aug 1, 2016 at 11:27 AM, Tom Lane wrote: >> Perhaps, but this'd break existing hash indexes. That might not be >> a fatal objection, but if we're going to put users through that >> I'd like to think a little bigger in terms of the benefits we get. >> I've thought fo

Re: [HACKERS] Combining hash values

2016-08-01 Thread Robert Haas
On Mon, Aug 1, 2016 at 7:24 AM, Dean Rasheed wrote: > On 1 August 2016 at 08:19, Greg Stark wrote: >> Surely combining multiple hashes is the same problem as hashing a block of >> memory? Shouldn't we just use the same algorithm as hash_any()? > > Yes, I imagine that should work well. I suspect t

Re: [HACKERS] Combining hash values

2016-08-01 Thread Robert Haas
On Mon, Aug 1, 2016 at 11:27 AM, Tom Lane wrote: > Dean Rasheed writes: >> On that subject, while looking at hashfunc.c, I spotted that >> hashint8() has a very obvious deficiency, which causes disastrous >> performance with certain inputs: > > Well, if you're trying to squeeze 64 bits into a 32-

Re: [HACKERS] Combining hash values

2016-08-01 Thread Tom Lane
Dean Rasheed writes: > On that subject, while looking at hashfunc.c, I spotted that > hashint8() has a very obvious deficiency, which causes disastrous > performance with certain inputs: Well, if you're trying to squeeze 64 bits into a 32-bit result, there are always going to be collisions somewh

Re: [HACKERS] Combining hash values

2016-08-01 Thread Tom Lane
Greg Stark writes: > I was originally going to suggest using a crc to combine but iirc we > changed hash_any() a while back and decided against using crc. I don't know > if that was wise but wouldn't want to suggest relitigating that. Nah, CRCs are designed to solve a different problem, ie detec

Re: [HACKERS] Combining hash values

2016-08-01 Thread Dean Rasheed
On 1 August 2016 at 08:19, Greg Stark wrote: > Surely combining multiple hashes is the same problem as hashing a block of > memory? Shouldn't we just use the same algorithm as hash_any()? > Yes, I imagine that should work well. I suspect that Thomas's proposal would also probably work well, as wo

Re: [HACKERS] Combining hash values

2016-08-01 Thread Greg Stark
Surely combining multiple hashes is the same problem as hashing a block of memory? Shouldn't we just use the same algorithm as hash_any()? I was originally going to suggest using a crc to combine but iirc we changed hash_any() a while back and decided against using crc. I don't know if that was w

Re: [HACKERS] Combining hash values

2016-07-31 Thread Thomas Munro
On Mon, Aug 1, 2016 at 4:34 AM, Tom Lane wrote: > Thomas Munro writes: >> 2. I suspect that this algorithm for combining hashes is weak, and >> could amplify weaknesses in the hash functions feeding it. > > Very possibly, but ... Concrete example: suppose a clever data type author defines a per

Re: [HACKERS] Combining hash values

2016-07-31 Thread Tom Lane
Thomas Munro writes: > 2. I suspect that this algorithm for combining hashes is weak, and > could amplify weaknesses in the hash functions feeding it. Very possibly, but ... > Compare > Boost's hash_combine, which does some more work before XORing: > seed ^= hash_value(v) + 0x9e3779b9 + (s

[HACKERS] Combining hash values

2016-07-30 Thread Thomas Munro
Hi hackers, Andres Freund asked me off list what I thought about this part of execGrouping.c, which builds a hash from the hashes of several datums in a loop: /* rotate hashkey left 1 bit at each step */ hashkey = (hashkey << 1) | ((hashkey & 0x8000) ? 1 : 0); ...