Re: [PATCHES] hash index improving v3

2008-09-04 Thread Zdenek Kotala
Alex Hunsaker napsal(a): On Thu, Sep 4, 2008 at 9:48 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: Ok here are the results: (data generated from the c program before) select count(1) from test_hash; count --- 10011 create index test_hash_num_idx on test_hash using hash (num); CV

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Alex Hunsaker
On Thu, Sep 4, 2008 at 9:48 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote: Ok here are the results: (data generated from the c program before) select count(1) from test_hash; count --- 10011 create index test_hash_num_idx on test_hash using hash (num); CVS: Time: 698065.180 ms patc

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Alex Hunsaker
On Thu, Sep 4, 2008 at 8:17 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > I guess one thing we could do for testing purposes is lobotomize one of > the datatype-specific hash functions. For instance, make int8_hash > return the input mod 2^32, ignoring the upper bytes. Then it'd be easy > to compute

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
"Alex Hunsaker" <[EMAIL PROTECTED]> writes: > On Thu, Sep 4, 2008 at 7:13 PM, Tom Lane <[EMAIL PROTECTED]> wrote: >> * check that the queries actually use the indexes (not sure that the >> proposed switch settings ensure this, not to mention you didn't create >> the indexes) > Well I was assuming

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
"Alex Hunsaker" <[EMAIL PROTECTED]> writes: > On Thu, Sep 4, 2008 at 7:45 PM, Tom Lane <[EMAIL PROTECTED]> wrote: >> So what we need for testing is a few different key values that hash to >> the same code. Not sure about an easy way to find such. > Hrm, well I have not really looked at the hash a

Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)

2008-09-04 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > After reading the wikipedia article on Boyer-Moore search algorithm, it > looks to me like this patch actually implements the simpler > Boyer-Moore-Horspool algorithm that only uses one lookup table. That's > probably fine, as it ought to be faste

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Alex Hunsaker
On Thu, Sep 4, 2008 at 7:45 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > So what we need for testing is a few different key values that hash to > the same code. Not sure about an easy way to find such. Hrm, well I have not really looked at the hash algorithm but I assume we could just reduce the num

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Alex Hunsaker
On Thu, Sep 4, 2008 at 7:13 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > "Alex Hunsaker" <[EMAIL PROTECTED]> writes: >> Ok let me know if this is to naive of an approach or not hitting the >> right cases you want tested. > > You have the unique-versus-not dimension, but I'm also wondering about > narr

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
I wrote: > You have the unique-versus-not dimension, On second thought, actually not. What we want to look at is the penalty for false matches due to *distinct* key values that happen to have the same hash codes. Your test case for all-the-same is using all the same key values, which means it'll

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
"Alex Hunsaker" <[EMAIL PROTECTED]> writes: > Ok let me know if this is to naive of an approach or not hitting the > right cases you want tested. You have the unique-versus-not dimension, but I'm also wondering about narrow vs wide index keys (say about 8 bytes vs 50-100 or so). In the former cas

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Alex Hunsaker
On Thu, Sep 4, 2008 at 5:11 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > So my thinking right now is that we should just test this patch as-is. > If it doesn't show really horrid performance when there are lots of > hash key collisions, we should forget the store-both-things idea and > just go with th

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
Here is an updated patch incorporating Zdenek's review, my own observation that we should make the index tupledesc tell the truth, and some other fixes/improvements such as making backwards scans work as expected. The main thing lacking before this could be committed, from a code standpoint, is a

Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)

2008-09-04 Thread Heikki Linnakangas
After reading the wikipedia article on Boyer-Moore search algorithm, it looks to me like this patch actually implements the simpler Boyer-Moore-Horspool algorithm that only uses one lookup table. That's probably fine, as it ought to be faster on small needles and haystacks because it requires l

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
Zdenek Kotala <[EMAIL PROTECTED]> writes: > pgsql/src/backend/access/hash/hashutil.c > > It would be better remove #define from hash.h and setup it there > directly. Actually, I don't like this aspect of the patch one bit: it means that the

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Tom Lane
Zdenek Kotala <[EMAIL PROTECTED]> writes: > I performed code review and see my comments. Thanks for the comments. I've incorporated all of these into an updated patch that I'm preparing, except for > Why not define new datatype for example HashKey instead of uint32? This seems like a good

Re: [PATCHES] hash index improving v3

2008-09-04 Thread Zdenek Kotala
I performed code review and see my comments. pgsql/src/backend/access/hash/hashpage.c use sizeof() or something better the 4. pgsql/src/backend/access/hash/hashpage.c New empty line