Re: [HACKERS] Hash index todo list item

2007-11-14 Thread Kenneth Marshall
On Thu, Sep 13, 2007 at 02:02:14PM -0700, Neil Conway wrote: On Fri, 2007-09-07 at 08:29 -0500, Kenneth Marshall wrote: This is a great starting point. I would appreciate it if you have the time and could make it apply cleanly to HEAD. Just to give you an update on this, I'll try to find

Re: [HACKERS] Hash index todo list item

2007-11-06 Thread Gokulakannan Somasundaram
I have not followed this thread very closely. But just wanted to give my inputs. From the results obtained, the average of all the hash probes is 141.8ms, the average for btree is 168.5, a difference of about 27.The standard deviations are about 23, so this is a statistically significant

Re: [HACKERS] Hash index todo list item

2007-11-06 Thread Jens-Wolfhard Schicke
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Shreya Bhargava wrote: Note that the bottom line for the problems with hash indexes is that the current implementation doesn't offer any advantages over btree indexes. Hash indexes need to be not only as good of a choice as btree indexes but

Re: [HACKERS] Hash index todo list item

2007-11-06 Thread Heikki Linnakangas
Shreya Bhargava wrote: 1. Populate the table with 80 million tuples. 2. Create HASH index on the table. 3. clear both linux cache psql buffers. (exiting psql and restarting it cleared the psql buffers; to clear linux cache, we used drop_cache command) 4. start psql 5. select on an

Re: [HACKERS] Hash index todo list item

2007-11-06 Thread Tom Lane
Shreya Bhargava [EMAIL PROTECTED] writes: We performed some probe tests on our patch on hash index and the original btree index to compare the performance between the two. Interesting, but ... From the results obtained, the average of all the hash probes is 141.8ms, the average for btree

Re: [HACKERS] Hash index todo list item

2007-11-05 Thread Shreya Bhargava
Note that the bottom line for the problems with hash indexes is that the current implementation doesn't offer any advantages over btree indexes. Hash indexes need to be not only as good of a choice as btree indexes but significantly better a significantly better choice at least for some specific

Re: [HACKERS] Hash index todo list item

2007-10-22 Thread Kenneth Marshall
On Thu, Sep 13, 2007 at 02:02:14PM -0700, Neil Conway wrote: On Fri, 2007-09-07 at 08:29 -0500, Kenneth Marshall wrote: This is a great starting point. I would appreciate it if you have the time and could make it apply cleanly to HEAD. Just to give you an update on this, I'll try to find

Re: [HACKERS] Hash index todo list item

2007-10-21 Thread Tom Raney
Kenneth, I just pushed the revised patch (v2!). The revised approach samples the parent relation to estimate the number of tuples rather than performing a complete scan. In my tests, the estimate appears to be accurate, erring on the larger side, which is fine. Tom, That is great. I am

Re: [HACKERS] Hash index todo list item

2007-10-21 Thread Kenneth Marshall
Tom, Thank you for the update. I am currently working on updating the patch Neil Conway sent in against 8.0-ish that stores only the hash in the index and locates the entries within the page using a binary search. Then I will fold in your recent update. On Sun, Oct 21, 2007 at 01:13:48PM -0700,

Re: [HACKERS] Hash index todo list item

2007-10-17 Thread Kenneth Marshall
Tom, That is great. I am looking forward to your patch. After the issues that you needed to address, I think that it would be reasonable to add a few more user settings for the hash index. Fill-factor is too course a knob. The others that I have been considering are: maxtuples - Not really the

Re: [HACKERS] Hash index todo list item

2007-10-12 Thread Kenneth Marshall
On Wed, Sep 05, 2007 at 03:07:03PM -0500, Kenneth Marshall wrote: On Sun, Sep 02, 2007 at 01:04:04PM -0500, Kenneth Marshall wrote: Dear PostgreSQL Hackers: After following the hackers mailing list for quite a while, I am going to start investigating what will need to be done to

Re: [HACKERS] Hash index todo list item

2007-09-28 Thread Bruce Momjian
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --- Tom Raney wrote: We are pleased to announce an upcoming patch to the hash index code which improves build time and

Re: [HACKERS] Hash index todo list item

2007-09-25 Thread Kenneth Marshall
On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote: We are pleased to announce an upcoming patch to the hash index code which improves build time and index size, based on this item in the TODO list: During index creation, pre-sort the tuples to improve build speed

Re: [HACKERS] Hash index todo list item

2007-09-25 Thread Gregory Stark
Kenneth Marshall [EMAIL PROTECTED] writes: On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote: Using our implementation, build times and index sizes are comparable with btree index build times and index sizes. ... That is super! (and timely) It is pretty super. I have a few comments

Re: [HACKERS] Hash index todo list item

2007-09-25 Thread Kenneth Marshall
On Tue, Sep 25, 2007 at 03:35:47PM +0100, Gregory Stark wrote: Kenneth Marshall [EMAIL PROTECTED] writes: On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote: Using our implementation, build times and index sizes are comparable with btree index build times and index sizes. ...

Re: [HACKERS] Hash index todo list item

2007-09-25 Thread Michael Glaesemann
On Sep 25, 2007, at 11:26 , Kenneth Marshall wrote: Although I am very excited about this patch, I do not see any real value in including it in 8.3. I don't think you have to worry about it being in 8.3. Feature freeze was months ago. Michael Glaesemann grzm seespotcode net

Re: [HACKERS] Hash index todo list item

2007-09-25 Thread Tom Raney
Kenneth Marshall wrote: On Tue, Sep 25, 2007 at 03:35:47PM +0100, Gregory Stark wrote: Kenneth Marshall [EMAIL PROTECTED] writes: On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote: Using our implementation, build times and index sizes are comparable with btree index

Re: [HACKERS] Hash index todo list item

2007-09-20 Thread Tom Raney
We are pleased to announce an upcoming patch to the hash index code which improves build time and index size, based on this item in the TODO list: During index creation, pre-sort the tuples to improve build speed http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php Using our

Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Jens-Wolfhard Schicke
--On Samstag, September 08, 2007 18:56:23 -0400 Mark Mielke [EMAIL PROTECTED] wrote: Kenneth Marshall wrote: Along with the hypothetical performance wins, the hash index space efficiency would be improved by a similar factor. Obviously, all of these ideas would need to be tested in various

Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Jens-Wolfhard Schicke
More random thoughts: - Hash-Indices are best for unique keys, but every table needs a new hash key, which means one more random page access. Is there any way to build multi-_table_ indices? A join might then fetch all table rows with a given unique key after one page fetch for the combined

Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Mark Mielke
Kenneth Marshall wrote: On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote: Kenneth Marshall [EMAIL PROTECTED] writes: ... This is the rough plan. Does anyone see anything critical that is missing at this point? Sounds pretty good. Let me brain-dump one item on you: one

Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Martijn van Oosterhout
On Sat, Sep 08, 2007 at 06:56:23PM -0400, Mark Mielke wrote: I think that if the case of 1 entry per hash becomes common enough to be significant, and the key is stored in the hash, that a btree will perform equal or better, and there is no point in pursuing such a hash index model. This

Re: [HACKERS] Hash index todo list item

2007-09-09 Thread Kenneth Marshall
On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote: Kenneth Marshall [EMAIL PROTECTED] writes: ... This is the rough plan. Does anyone see anything critical that is missing at this point? Sounds pretty good. Let me brain-dump one item on you: one thing that hash currently has over

Re: [HACKERS] Hash index todo list item

2007-09-08 Thread Kenneth Marshall
On Fri, Sep 07, 2007 at 10:36:41AM -0400, Brian Hurt wrote: Kenneth Marshall wrote: I understand that a hash value is a many-to-one mapping. That is the point of the flag in the index. The flag means that there is only one item in the heap corresponding to that hash value. In this case we

Re: [HACKERS] Hash index todo list item

2007-09-08 Thread Mark Mielke
Kenneth Marshall wrote: Continuing this train of thought While it would make sense for larger keys to store the hash in the index, if the key is smaller, particularly if it is of fixed size, it would make sense to store the key in the index instead. This would have the benefit of allowing

Re: [HACKERS] Hash index todo list item

2007-09-08 Thread Kenneth Marshall
On Sat, Sep 08, 2007 at 05:14:09PM -0400, Mark Mielke wrote: Kenneth Marshall wrote: Continuing this train of thought While it would make sense for larger keys to store the hash in the index, if the key is smaller, particularly if it is of fixed size, it would make sense to store the key

Re: [HACKERS] Hash index todo list item

2007-09-08 Thread Mark Mielke
Kenneth Marshall wrote: The value of storing the actual value, possibly as an option, is that the key check can be done in the index without requiring a heap lookup to check the actual value which would be a benefit for a unique index. I agree that supporting variable length data would

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Neil Conway
On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote: 2. Evaluate the performance of different hash index implementations and/or changes to the current implementation. My current plan is to keep the implementation as simple as possible and still provide the desired performance.

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall
On Fri, Sep 07, 2007 at 09:50:07AM -0400, Mark Mielke wrote: Kenneth Marshall wrote: On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote: You might find this patch useful: http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php ... Unfortunately, the patch doesn't

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Mark Mielke
Kenneth Marshall wrote: On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote: You might find this patch useful: http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php ... Unfortunately, the patch doesn't apply cleanly to HEAD, but I can merge it up to HEAD if you'd

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall
On Fri, Sep 07, 2007 at 12:55:37PM +0100, Heikki Linnakangas wrote: Neil Conway wrote: You might find this patch useful: http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php Oh, I had forgot about that. It implements the just store the hash in the index idea; it

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall
On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote: On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote: 2. Evaluate the performance of different hash index implementations and/or changes to the current implementation. My current plan is to keep the implementation as

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Heikki Linnakangas
Neil Conway wrote: You might find this patch useful: http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php Oh, I had forgot about that. It implements the just store the hash in the index idea; it also sorts the entries in a bucket by the hash value, which allows binary

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall
On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote: On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote: 2. Evaluate the performance of different hash index implementations and/or changes to the current implementation. My current plan is to keep the implementation as

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Mark Mielke
Kenneth Marshall wrote: I understand that a hash value is a many-to-one mapping. That is the point of the flag in the index. The flag means that there is only one item in the heap corresponding to that hash value. In this case we know that the value in the heap is the correct one and a possibly

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall
On Fri, Sep 07, 2007 at 10:36:41AM -0400, Brian Hurt wrote: Kenneth Marshall wrote: I understand that a hash value is a many-to-one mapping. That is the point of the flag in the index. The flag means that there is only one item in the heap corresponding to that hash value. In this case we

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall
On Fri, Sep 07, 2007 at 11:08:13AM -0400, Brian Hurt wrote: Kenneth Marshall wrote: How likely is it that you will get a hash collision, two strings that are different that will hash to the same value? To avoid this requires a very large hash key (128 bits, minimum)- otherwise you

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall
On Fri, Sep 07, 2007 at 10:30:30AM -0400, Mark Mielke wrote: Kenneth Marshall wrote: I understand that a hash value is a many-to-one mapping. That is the point of the flag in the index. The flag means that there is only one item in the heap corresponding to that hash value. In this case we

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Brian Hurt
Kenneth Marshall wrote: How likely is it that you will get a hash collision, two strings that are different that will hash to the same value? To avoid this requires a very large hash key (128 bits, minimum)- otherwise you get into birthday attack problems. With a 32-bit hash, the

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Brian Hurt
Kenneth Marshall wrote: I understand that a hash value is a many-to-one mapping. That is the point of the flag in the index. The flag means that there is only one item in the heap corresponding to that hash value. In this case we know that the value in the heap is the correct one and a possibly

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Martijn van Oosterhout
On Thu, Sep 06, 2007 at 01:08:59PM -0500, Kenneth Marshall wrote: Since we already have to check the actual tuple values for any index lookup in postgresql, we could only store the full hash value and the corresponding TIDs in the bucket. Then when we lookup an item by calculating its hash, if

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Hannu Krosing
Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane: Gregory Stark [EMAIL PROTECTED] writes: Kenneth Marshall [EMAIL PROTECTED] writes: - What about multi-column indexes? The current implementation only supports 1 column. That seems kind of weird. It seems obvious that you

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes: Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane: No, because part of the deal is that you can do lookups using only the leading index columns. At least, all the existing multicolumn index types can do that. One approahc is not to mix

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Mark Mielke
Hannu Krosing wrote: One approahc is not to mix hashes, but to partition the hash, so that each column gets its N bits in the hash. How does that help? You still need all the keys to find out which bucket to look in. no. you need to look at only the buckets where that part of

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Michael Glaesemann
On Sep 6, 2007, at 10:53 , Mark Mielke wrote: I don't like the truncating hash suggestion because it limits the ability of a hash code to uniquely identify a key. AIUI, a hash can't be used as a unique identifier: it always needs to be rechecked due to the chance of collisions. There

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Hannu Krosing
Ühel kenal päeval, N, 2007-09-06 kell 09:38, kirjutas Tom Lane: Hannu Krosing [EMAIL PROTECTED] writes: Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane: No, because part of the deal is that you can do lookups using only the leading index columns. At least, all the existing

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Kenneth Marshall
On Thu, Sep 06, 2007 at 11:53:45AM -0400, Mark Mielke wrote: Hannu Krosing wrote: One approahc is not to mix hashes, but to partition the hash, so that each column gets its N bits in the hash. How does that help? You still need all the keys to find out which bucket to look in.

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Mark Mielke
Michael Glaesemann wrote: On Sep 6, 2007, at 10:53 , Mark Mielke wrote: I don't like the truncating hash suggestion because it limits the ability of a hash code to uniquely identify a key. AIUI, a hash can't be used as a unique identifier: it always needs to be rechecked due to the chance of

Re: [HACKERS] Hash index todo list item

2007-09-05 Thread Kenneth Marshall
On Sun, Sep 02, 2007 at 01:04:04PM -0500, Kenneth Marshall wrote: Dear PostgreSQL Hackers: After following the hackers mailing list for quite a while, I am going to start investigating what will need to be done to improve hash index performance. Below are the pieces of this project that I

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Simon Riggs
On Sun, 2007-09-02 at 13:04 -0500, Kenneth Marshall wrote: Dear PostgreSQL Hackers: After following the hackers mailing list for quite a while, I am going to start investigating what will need to be done to improve hash index performance. Below are the pieces of this project that I am

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Kenneth Marshall
On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote: Kenneth Marshall [EMAIL PROTECTED] writes: ... This is the rough plan. Does anyone see anything critical that is missing at this point? Sounds pretty good. Let me brain-dump one item on you: one thing that hash currently has over

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Kenneth Marshall
On Mon, Sep 03, 2007 at 10:33:54AM +0100, Simon Riggs wrote: This is the rough plan. Does anyone see anything critical that is missing at this point? Please send me any suggestions for test data and various performance test ideas, since I will be working on that first. Sounds good.

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Gregory Stark
Kenneth Marshall [EMAIL PROTECTED] writes: On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote: Kenneth Marshall [EMAIL PROTECTED] writes: ... This is the rough plan. Does anyone see anything critical that is missing at this point? Sounds pretty good. Let me brain-dump one item on

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes: Kenneth Marshall [EMAIL PROTECTED] writes: - What about multi-column indexes? The current implementation only supports 1 column. That seems kind of weird. It seems obvious that you mix the three hashes together which reduces it to the solved problem.

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Ben Tilly
On 9/3/07, Gregory Stark [EMAIL PROTECTED] wrote: Kenneth Marshall [EMAIL PROTECTED] writes: On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote: Kenneth Marshall [EMAIL PROTECTED] writes: ... This is the rough plan. Does anyone see anything critical that is missing at this

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Kenneth Marshall
On Mon, Sep 03, 2007 at 05:20:34PM -0700, Ben Tilly wrote: That raises a very random thought. One of the nicer features of Oracle is the ability to have function-based indexes. So you could index, say, trim(lower(person.name)). There are a *lot* of practical situations where that comes in

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Tom Lane
Ben Tilly [EMAIL PROTECTED] writes: That raises a very random thought. One of the nicer features of Oracle is the ability to have function-based indexes. So you could index, say, trim(lower(person.name)). Is there any prospect of postgres aquiring that functionality? Uh, no, since it's

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Ben Tilly
On 9/3/07, Tom Lane [EMAIL PROTECTED] wrote: Ben Tilly [EMAIL PROTECTED] writes: That raises a very random thought. One of the nicer features of Oracle is the ability to have function-based indexes. So you could index, say, trim(lower(person.name)). Is there any prospect of postgres

[HACKERS] Hash index todo list item

2007-09-02 Thread Kenneth Marshall
Dear PostgreSQL Hackers: After following the hackers mailing list for quite a while, I am going to start investigating what will need to be done to improve hash index performance. Below are the pieces of this project that I am currently considering: 1. Characterize the current hash index

Re: [HACKERS] Hash index todo list item

2007-09-02 Thread Tom Lane
Kenneth Marshall [EMAIL PROTECTED] writes: ... This is the rough plan. Does anyone see anything critical that is missing at this point? Sounds pretty good. Let me brain-dump one item on you: one thing that hash currently has over btree is the ability to handle index items up to a full page.