subject:"\[HACKERS\] Hash index todo list item"

Re: [HACKERS] Hash index todo list item

2007-11-14 Thread Kenneth Marshall

On Thu, Sep 13, 2007 at 02:02:14PM -0700, Neil Conway wrote: On Fri, 2007-09-07 at 08:29 -0500, Kenneth Marshall wrote: This is a great starting point. I would appreciate it if you have the time and could make it apply cleanly to HEAD. Just to give you an update on this, I'll try to find

Re: [HACKERS] Hash index todo list item

2007-11-06 Thread Gokulakannan Somasundaram

I have not followed this thread very closely. But just wanted to give my inputs. From the results obtained, the average of all the hash probes is 141.8ms, the average for btree is 168.5, a difference of about 27.The standard deviations are about 23, so this is a statistically significant

Re: [HACKERS] Hash index todo list item

2007-11-06 Thread Jens-Wolfhard Schicke

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Shreya Bhargava wrote: Note that the bottom line for the problems with hash indexes is that the current implementation doesn't offer any advantages over btree indexes. Hash indexes need to be not only as good of a choice as btree indexes but

Re: [HACKERS] Hash index todo list item

2007-11-06 Thread Heikki Linnakangas

Shreya Bhargava wrote: 1. Populate the table with 80 million tuples. 2. Create HASH index on the table. 3. clear both linux cache psql buffers. (exiting psql and restarting it cleared the psql buffers; to clear linux cache, we used drop_cache command) 4. start psql 5. select on an

Re: [HACKERS] Hash index todo list item

2007-11-06 Thread Tom Lane

Shreya Bhargava [EMAIL PROTECTED] writes: We performed some probe tests on our patch on hash index and the original btree index to compare the performance between the two. Interesting, but ... From the results obtained, the average of all the hash probes is 141.8ms, the average for btree

Re: [HACKERS] Hash index todo list item

2007-11-05 Thread Shreya Bhargava

Note that the bottom line for the problems with hash indexes is that the current implementation doesn't offer any advantages over btree indexes. Hash indexes need to be not only as good of a choice as btree indexes but significantly better a significantly better choice at least for some specific

Re: [HACKERS] Hash index todo list item

2007-10-22 Thread Kenneth Marshall

On Thu, Sep 13, 2007 at 02:02:14PM -0700, Neil Conway wrote: On Fri, 2007-09-07 at 08:29 -0500, Kenneth Marshall wrote: This is a great starting point. I would appreciate it if you have the time and could make it apply cleanly to HEAD. Just to give you an update on this, I'll try to find

Re: [HACKERS] Hash index todo list item

2007-10-21 Thread Tom Raney

Kenneth, I just pushed the revised patch (v2!). The revised approach samples the parent relation to estimate the number of tuples rather than performing a complete scan. In my tests, the estimate appears to be accurate, erring on the larger side, which is fine. Tom, That is great. I am

Re: [HACKERS] Hash index todo list item

2007-10-21 Thread Kenneth Marshall

Tom, Thank you for the update. I am currently working on updating the patch Neil Conway sent in against 8.0-ish that stores only the hash in the index and locates the entries within the page using a binary search. Then I will fold in your recent update. On Sun, Oct 21, 2007 at 01:13:48PM -0700,

Re: [HACKERS] Hash index todo list item

2007-10-17 Thread Kenneth Marshall

Tom, That is great. I am looking forward to your patch. After the issues that you needed to address, I think that it would be reasonable to add a few more user settings for the hash index. Fill-factor is too course a knob. The others that I have been considering are: maxtuples - Not really the

Re: [HACKERS] Hash index todo list item

2007-10-12 Thread Kenneth Marshall

On Wed, Sep 05, 2007 at 03:07:03PM -0500, Kenneth Marshall wrote: On Sun, Sep 02, 2007 at 01:04:04PM -0500, Kenneth Marshall wrote: Dear PostgreSQL Hackers: After following the hackers mailing list for quite a while, I am going to start investigating what will need to be done to

Re: [HACKERS] Hash index todo list item

2007-09-28 Thread Bruce Momjian

This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --- Tom Raney wrote: We are pleased to announce an upcoming patch to the hash index code which improves build time and

Re: [HACKERS] Hash index todo list item

2007-09-25 Thread Kenneth Marshall

On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote: We are pleased to announce an upcoming patch to the hash index code which improves build time and index size, based on this item in the TODO list: During index creation, pre-sort the tuples to improve build speed

Re: [HACKERS] Hash index todo list item

2007-09-25 Thread Gregory Stark

Kenneth Marshall [EMAIL PROTECTED] writes: On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote: Using our implementation, build times and index sizes are comparable with btree index build times and index sizes. ... That is super! (and timely) It is pretty super. I have a few comments

Re: [HACKERS] Hash index todo list item

2007-09-25 Thread Kenneth Marshall

On Tue, Sep 25, 2007 at 03:35:47PM +0100, Gregory Stark wrote: Kenneth Marshall [EMAIL PROTECTED] writes: On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote: Using our implementation, build times and index sizes are comparable with btree index build times and index sizes. ...

Re: [HACKERS] Hash index todo list item

2007-09-25 Thread Michael Glaesemann

On Sep 25, 2007, at 11:26 , Kenneth Marshall wrote: Although I am very excited about this patch, I do not see any real value in including it in 8.3. I don't think you have to worry about it being in 8.3. Feature freeze was months ago. Michael Glaesemann grzm seespotcode net

Re: [HACKERS] Hash index todo list item

2007-09-25 Thread Tom Raney

Kenneth Marshall wrote: On Tue, Sep 25, 2007 at 03:35:47PM +0100, Gregory Stark wrote: Kenneth Marshall [EMAIL PROTECTED] writes: On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote: Using our implementation, build times and index sizes are comparable with btree index

Re: [HACKERS] Hash index todo list item

2007-09-20 Thread Tom Raney

We are pleased to announce an upcoming patch to the hash index code which improves build time and index size, based on this item in the TODO list: During index creation, pre-sort the tuples to improve build speed http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php Using our

Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Jens-Wolfhard Schicke

--On Samstag, September 08, 2007 18:56:23 -0400 Mark Mielke [EMAIL PROTECTED] wrote: Kenneth Marshall wrote: Along with the hypothetical performance wins, the hash index space efficiency would be improved by a similar factor. Obviously, all of these ideas would need to be tested in various

Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Jens-Wolfhard Schicke

More random thoughts: - Hash-Indices are best for unique keys, but every table needs a new hash key, which means one more random page access. Is there any way to build multi-_table_ indices? A join might then fetch all table rows with a given unique key after one page fetch for the combined

Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Mark Mielke

Kenneth Marshall wrote: On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote: Kenneth Marshall [EMAIL PROTECTED] writes: ... This is the rough plan. Does anyone see anything critical that is missing at this point? Sounds pretty good. Let me brain-dump one item on you: one

Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Martijn van Oosterhout

On Sat, Sep 08, 2007 at 06:56:23PM -0400, Mark Mielke wrote: I think that if the case of 1 entry per hash becomes common enough to be significant, and the key is stored in the hash, that a btree will perform equal or better, and there is no point in pursuing such a hash index model. This

Re: [HACKERS] Hash index todo list item

2007-09-09 Thread Kenneth Marshall

On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote: Kenneth Marshall [EMAIL PROTECTED] writes: ... This is the rough plan. Does anyone see anything critical that is missing at this point? Sounds pretty good. Let me brain-dump one item on you: one thing that hash currently has over

Re: [HACKERS] Hash index todo list item

2007-09-08 Thread Kenneth Marshall

On Fri, Sep 07, 2007 at 10:36:41AM -0400, Brian Hurt wrote: Kenneth Marshall wrote: I understand that a hash value is a many-to-one mapping. That is the point of the flag in the index. The flag means that there is only one item in the heap corresponding to that hash value. In this case we

Re: [HACKERS] Hash index todo list item

2007-09-08 Thread Mark Mielke

Kenneth Marshall wrote: Continuing this train of thought While it would make sense for larger keys to store the hash in the index, if the key is smaller, particularly if it is of fixed size, it would make sense to store the key in the index instead. This would have the benefit of allowing

Re: [HACKERS] Hash index todo list item

2007-09-08 Thread Kenneth Marshall

On Sat, Sep 08, 2007 at 05:14:09PM -0400, Mark Mielke wrote: Kenneth Marshall wrote: Continuing this train of thought While it would make sense for larger keys to store the hash in the index, if the key is smaller, particularly if it is of fixed size, it would make sense to store the key

Re: [HACKERS] Hash index todo list item

2007-09-08 Thread Mark Mielke

Kenneth Marshall wrote: The value of storing the actual value, possibly as an option, is that the key check can be done in the index without requiring a heap lookup to check the actual value which would be a benefit for a unique index. I agree that supporting variable length data would

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Neil Conway

On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote: 2. Evaluate the performance of different hash index implementations and/or changes to the current implementation. My current plan is to keep the implementation as simple as possible and still provide the desired performance.

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall

On Fri, Sep 07, 2007 at 09:50:07AM -0400, Mark Mielke wrote: Kenneth Marshall wrote: On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote: You might find this patch useful: http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php ... Unfortunately, the patch doesn't

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Mark Mielke

Kenneth Marshall wrote: On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote: You might find this patch useful: http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php ... Unfortunately, the patch doesn't apply cleanly to HEAD, but I can merge it up to HEAD if you'd

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall

On Fri, Sep 07, 2007 at 12:55:37PM +0100, Heikki Linnakangas wrote: Neil Conway wrote: You might find this patch useful: http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php Oh, I had forgot about that. It implements the just store the hash in the index idea; it

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall

On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote: On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote: 2. Evaluate the performance of different hash index implementations and/or changes to the current implementation. My current plan is to keep the implementation as

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Heikki Linnakangas

Neil Conway wrote: You might find this patch useful: http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php Oh, I had forgot about that. It implements the just store the hash in the index idea; it also sorts the entries in a bucket by the hash value, which allows binary

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall

On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote: On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote: 2. Evaluate the performance of different hash index implementations and/or changes to the current implementation. My current plan is to keep the implementation as

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Mark Mielke

Kenneth Marshall wrote: I understand that a hash value is a many-to-one mapping. That is the point of the flag in the index. The flag means that there is only one item in the heap corresponding to that hash value. In this case we know that the value in the heap is the correct one and a possibly

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall

On Fri, Sep 07, 2007 at 10:36:41AM -0400, Brian Hurt wrote: Kenneth Marshall wrote: I understand that a hash value is a many-to-one mapping. That is the point of the flag in the index. The flag means that there is only one item in the heap corresponding to that hash value. In this case we

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall

On Fri, Sep 07, 2007 at 11:08:13AM -0400, Brian Hurt wrote: Kenneth Marshall wrote: How likely is it that you will get a hash collision, two strings that are different that will hash to the same value? To avoid this requires a very large hash key (128 bits, minimum)- otherwise you

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall

On Fri, Sep 07, 2007 at 10:30:30AM -0400, Mark Mielke wrote: Kenneth Marshall wrote: I understand that a hash value is a many-to-one mapping. That is the point of the flag in the index. The flag means that there is only one item in the heap corresponding to that hash value. In this case we

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Brian Hurt

Kenneth Marshall wrote: How likely is it that you will get a hash collision, two strings that are different that will hash to the same value? To avoid this requires a very large hash key (128 bits, minimum)- otherwise you get into birthday attack problems. With a 32-bit hash, the

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Brian Hurt

Kenneth Marshall wrote: I understand that a hash value is a many-to-one mapping. That is the point of the flag in the index. The flag means that there is only one item in the heap corresponding to that hash value. In this case we know that the value in the heap is the correct one and a possibly

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Martijn van Oosterhout

On Thu, Sep 06, 2007 at 01:08:59PM -0500, Kenneth Marshall wrote: Since we already have to check the actual tuple values for any index lookup in postgresql, we could only store the full hash value and the corresponding TIDs in the bucket. Then when we lookup an item by calculating its hash, if

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Hannu Krosing

Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane: Gregory Stark [EMAIL PROTECTED] writes: Kenneth Marshall [EMAIL PROTECTED] writes: - What about multi-column indexes? The current implementation only supports 1 column. That seems kind of weird. It seems obvious that you

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Tom Lane

Hannu Krosing [EMAIL PROTECTED] writes: Ãhel kenal pÃ¤eval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane: No, because part of the deal is that you can do lookups using only the leading index columns. At least, all the existing multicolumn index types can do that. One approahc is not to mix

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Mark Mielke

Hannu Krosing wrote: One approahc is not to mix hashes, but to partition the hash, so that each column gets its N bits in the hash. How does that help? You still need all the keys to find out which bucket to look in. no. you need to look at only the buckets where that part of

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Michael Glaesemann

On Sep 6, 2007, at 10:53 , Mark Mielke wrote: I don't like the truncating hash suggestion because it limits the ability of a hash code to uniquely identify a key. AIUI, a hash can't be used as a unique identifier: it always needs to be rechecked due to the chance of collisions. There

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Hannu Krosing

Ühel kenal päeval, N, 2007-09-06 kell 09:38, kirjutas Tom Lane: Hannu Krosing [EMAIL PROTECTED] writes: Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane: No, because part of the deal is that you can do lookups using only the leading index columns. At least, all the existing

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Kenneth Marshall

On Thu, Sep 06, 2007 at 11:53:45AM -0400, Mark Mielke wrote: Hannu Krosing wrote: One approahc is not to mix hashes, but to partition the hash, so that each column gets its N bits in the hash. How does that help? You still need all the keys to find out which bucket to look in.

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Mark Mielke

Michael Glaesemann wrote: On Sep 6, 2007, at 10:53 , Mark Mielke wrote: I don't like the truncating hash suggestion because it limits the ability of a hash code to uniquely identify a key. AIUI, a hash can't be used as a unique identifier: it always needs to be rechecked due to the chance of

Re: [HACKERS] Hash index todo list item

2007-09-05 Thread Kenneth Marshall

On Sun, Sep 02, 2007 at 01:04:04PM -0500, Kenneth Marshall wrote: Dear PostgreSQL Hackers: After following the hackers mailing list for quite a while, I am going to start investigating what will need to be done to improve hash index performance. Below are the pieces of this project that I

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Simon Riggs

On Sun, 2007-09-02 at 13:04 -0500, Kenneth Marshall wrote: Dear PostgreSQL Hackers: After following the hackers mailing list for quite a while, I am going to start investigating what will need to be done to improve hash index performance. Below are the pieces of this project that I am

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Kenneth Marshall

On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote: Kenneth Marshall [EMAIL PROTECTED] writes: ... This is the rough plan. Does anyone see anything critical that is missing at this point? Sounds pretty good. Let me brain-dump one item on you: one thing that hash currently has over

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Kenneth Marshall

On Mon, Sep 03, 2007 at 10:33:54AM +0100, Simon Riggs wrote: This is the rough plan. Does anyone see anything critical that is missing at this point? Please send me any suggestions for test data and various performance test ideas, since I will be working on that first. Sounds good.

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Gregory Stark

Kenneth Marshall [EMAIL PROTECTED] writes: On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote: Kenneth Marshall [EMAIL PROTECTED] writes: ... This is the rough plan. Does anyone see anything critical that is missing at this point? Sounds pretty good. Let me brain-dump one item on

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Tom Lane

Gregory Stark [EMAIL PROTECTED] writes: Kenneth Marshall [EMAIL PROTECTED] writes: - What about multi-column indexes? The current implementation only supports 1 column. That seems kind of weird. It seems obvious that you mix the three hashes together which reduces it to the solved problem.

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Ben Tilly

On 9/3/07, Gregory Stark [EMAIL PROTECTED] wrote: Kenneth Marshall [EMAIL PROTECTED] writes: On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote: Kenneth Marshall [EMAIL PROTECTED] writes: ... This is the rough plan. Does anyone see anything critical that is missing at this

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Kenneth Marshall

On Mon, Sep 03, 2007 at 05:20:34PM -0700, Ben Tilly wrote: That raises a very random thought. One of the nicer features of Oracle is the ability to have function-based indexes. So you could index, say, trim(lower(person.name)). There are a *lot* of practical situations where that comes in

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Tom Lane

Ben Tilly [EMAIL PROTECTED] writes: That raises a very random thought. One of the nicer features of Oracle is the ability to have function-based indexes. So you could index, say, trim(lower(person.name)). Is there any prospect of postgres aquiring that functionality? Uh, no, since it's

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Ben Tilly

On 9/3/07, Tom Lane [EMAIL PROTECTED] wrote: Ben Tilly [EMAIL PROTECTED] writes: That raises a very random thought. One of the nicer features of Oracle is the ability to have function-based indexes. So you could index, say, trim(lower(person.name)). Is there any prospect of postgres

[HACKERS] Hash index todo list item

2007-09-02 Thread Kenneth Marshall

Dear PostgreSQL Hackers: After following the hackers mailing list for quite a while, I am going to start investigating what will need to be done to improve hash index performance. Below are the pieces of this project that I am currently considering: 1. Characterize the current hash index

Re: [HACKERS] Hash index todo list item

2007-09-02 Thread Tom Lane

Kenneth Marshall [EMAIL PROTECTED] writes: ... This is the rough plan. Does anyone see anything critical that is missing at this point? Sounds pretty good. Let me brain-dump one item on you: one thing that hash currently has over btree is the ability to handle index items up to a full page.

60 matches

Mail list logo