On Thu, Sep 13, 2007 at 02:02:14PM -0700, Neil Conway wrote:
On Fri, 2007-09-07 at 08:29 -0500, Kenneth Marshall wrote:
This is a great starting point. I would appreciate it if you have the
time and could make it apply cleanly to HEAD.
Just to give you an update on this, I'll try to find
I have not followed this thread very closely. But just wanted to give my inputs.
From the results obtained, the average of all the hash probes is 141.8ms,
the average for btree is 168.5, a difference of about 27.The standard
deviations are about 23, so this is a statistically significant
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Shreya Bhargava wrote:
Note that the bottom line for the problems with hash indexes is that the
current implementation doesn't offer any advantages over btree indexes. Hash
indexes need to be not only as good of a choice as btree indexes but
Shreya Bhargava wrote:
1. Populate the table with 80 million tuples.
2. Create HASH index on the table.
3. clear both linux cache psql buffers.
(exiting psql and restarting it cleared the psql buffers;
to clear linux cache, we used drop_cache command)
4. start psql
5. select on an
Shreya Bhargava [EMAIL PROTECTED] writes:
We performed some probe tests on our patch on
hash index and the original btree index to compare the
performance between the two.
Interesting, but ...
From the results obtained, the average of all the hash probes is 141.8ms, the
average for btree
Note that the bottom line for the problems with hash indexes is that the
current implementation doesn't offer any advantages over btree indexes. Hash
indexes need to be not only as good of a choice as btree indexes but
significantly better a significantly better choice at least for some specific
On Thu, Sep 13, 2007 at 02:02:14PM -0700, Neil Conway wrote:
On Fri, 2007-09-07 at 08:29 -0500, Kenneth Marshall wrote:
This is a great starting point. I would appreciate it if you have the
time and could make it apply cleanly to HEAD.
Just to give you an update on this, I'll try to find
Kenneth, I just pushed the revised patch (v2!). The revised approach
samples the parent relation to estimate the number of tuples rather than
performing a complete scan. In my tests, the estimate appears to be
accurate, erring on the larger side, which is fine.
Tom,
That is great. I am
Tom,
Thank you for the update. I am currently working on updating the
patch Neil Conway sent in against 8.0-ish that stores only the hash
in the index and locates the entries within the page using a binary
search. Then I will fold in your recent update.
On Sun, Oct 21, 2007 at 01:13:48PM -0700,
Tom,
That is great. I am looking forward to your patch. After the
issues that you needed to address, I think that it would be
reasonable to add a few more user settings for the hash index.
Fill-factor is too course a knob. The others that I have been
considering are:
maxtuples - Not really the
On Wed, Sep 05, 2007 at 03:07:03PM -0500, Kenneth Marshall wrote:
On Sun, Sep 02, 2007 at 01:04:04PM -0500, Kenneth Marshall wrote:
Dear PostgreSQL Hackers:
After following the hackers mailing list for quite a while,
I am going to start investigating what will need to be done
to
This has been saved for the 8.4 release:
http://momjian.postgresql.org/cgi-bin/pgpatches_hold
---
Tom Raney wrote:
We are pleased to announce an upcoming patch to the hash index code
which improves build time and
On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:
We are pleased to announce an upcoming patch to the hash index code
which improves build time and index size, based on this item in the
TODO list:
During index creation, pre-sort the tuples to improve build speed
Kenneth Marshall [EMAIL PROTECTED] writes:
On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:
Using our implementation, build times and index sizes are
comparable with btree index build times and index sizes.
...
That is super! (and timely)
It is pretty super. I have a few comments
On Tue, Sep 25, 2007 at 03:35:47PM +0100, Gregory Stark wrote:
Kenneth Marshall [EMAIL PROTECTED] writes:
On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:
Using our implementation, build times and index sizes are
comparable with btree index build times and index sizes.
...
On Sep 25, 2007, at 11:26 , Kenneth Marshall wrote:
Although I am very excited about this patch, I do not see any real
value
in including it in 8.3.
I don't think you have to worry about it being in 8.3. Feature freeze
was months ago.
Michael Glaesemann
grzm seespotcode net
Kenneth Marshall wrote:
On Tue, Sep 25, 2007 at 03:35:47PM +0100, Gregory Stark wrote:
Kenneth Marshall [EMAIL PROTECTED] writes:
On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:
Using our implementation, build times and index sizes are
comparable with btree index
We are pleased to announce an upcoming patch to the hash index code
which improves build time and index size, based on this item in the
TODO list:
During index creation, pre-sort the tuples to improve build speed
http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php
Using our
--On Samstag, September 08, 2007 18:56:23 -0400 Mark Mielke
[EMAIL PROTECTED] wrote:
Kenneth Marshall wrote:
Along with the hypothetical performance
wins, the hash index space efficiency would be improved by a similar
factor. Obviously, all of these ideas would need to be tested in
various
More random thoughts:
- Hash-Indices are best for unique keys, but every table needs a new hash
key, which means one more random page access. Is there any way to build
multi-_table_ indices? A join might then fetch all table rows with a given
unique key after one page fetch for the combined
Kenneth Marshall wrote:
On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
Kenneth Marshall [EMAIL PROTECTED] writes:
... This is the rough plan. Does anyone see anything critical that
is missing at this point?
Sounds pretty good. Let me brain-dump one item on you: one
On Sat, Sep 08, 2007 at 06:56:23PM -0400, Mark Mielke wrote:
I think that if the case of 1 entry per hash becomes common enough to
be significant, and the key is stored in the hash, that a btree will
perform equal or better, and there is no point in pursuing such a hash
index model. This
On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
Kenneth Marshall [EMAIL PROTECTED] writes:
... This is the rough plan. Does anyone see anything critical that
is missing at this point?
Sounds pretty good. Let me brain-dump one item on you: one thing that
hash currently has over
On Fri, Sep 07, 2007 at 10:36:41AM -0400, Brian Hurt wrote:
Kenneth Marshall wrote:
I understand that a hash value is a many-to-one mapping. That is the
point of the flag in the index. The flag means that there is only one
item in the heap corresponding to that hash value. In this case we
Kenneth Marshall wrote:
Continuing this train of thought While it would make sense for larger
keys to store the hash in the index, if the key is smaller, particularly
if it is of fixed size, it would make sense to store the key in the index
instead. This would have the benefit of allowing
On Sat, Sep 08, 2007 at 05:14:09PM -0400, Mark Mielke wrote:
Kenneth Marshall wrote:
Continuing this train of thought While it would make sense for larger
keys to store the hash in the index, if the key is smaller, particularly
if it is of fixed size, it would make sense to store the key
Kenneth Marshall wrote:
The value of storing the
actual value, possibly as an option, is that the key check can be done
in the index without requiring a heap lookup to check the actual value
which would be a benefit for a unique index. I agree that supporting
variable length data would
On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote:
2. Evaluate the performance of different hash index implementations
and/or changes to the current implementation. My current plan is
to keep the implementation as simple as possible and still provide
the desired performance.
On Fri, Sep 07, 2007 at 09:50:07AM -0400, Mark Mielke wrote:
Kenneth Marshall wrote:
On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote:
You might find this patch useful:
http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php
...
Unfortunately, the patch doesn't
Kenneth Marshall wrote:
On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote:
You might find this patch useful:
http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php
...
Unfortunately, the patch doesn't apply cleanly to HEAD, but I can merge
it up to HEAD if you'd
On Fri, Sep 07, 2007 at 12:55:37PM +0100, Heikki Linnakangas wrote:
Neil Conway wrote:
You might find this patch useful:
http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php
Oh, I had forgot about that.
It implements the just store the hash in the index idea; it
On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote:
On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote:
2. Evaluate the performance of different hash index implementations
and/or changes to the current implementation. My current plan is
to keep the implementation as
Neil Conway wrote:
You might find this patch useful:
http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php
Oh, I had forgot about that.
It implements the just store the hash in the index idea; it also sorts
the entries in a bucket by the hash value, which allows binary
On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote:
On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote:
2. Evaluate the performance of different hash index implementations
and/or changes to the current implementation. My current plan is
to keep the implementation as
Kenneth Marshall wrote:
I understand that a hash value is a many-to-one mapping. That is the
point of the flag in the index. The flag means that there is only one
item in the heap corresponding to that hash value. In this case we
know that the value in the heap is the correct one and a possibly
On Fri, Sep 07, 2007 at 10:36:41AM -0400, Brian Hurt wrote:
Kenneth Marshall wrote:
I understand that a hash value is a many-to-one mapping. That is the
point of the flag in the index. The flag means that there is only one
item in the heap corresponding to that hash value. In this case we
On Fri, Sep 07, 2007 at 11:08:13AM -0400, Brian Hurt wrote:
Kenneth Marshall wrote:
How likely is it that you will get a hash collision, two strings that are
different that will hash to the same value? To avoid this requires a
very large hash key (128 bits, minimum)- otherwise you
On Fri, Sep 07, 2007 at 10:30:30AM -0400, Mark Mielke wrote:
Kenneth Marshall wrote:
I understand that a hash value is a many-to-one mapping. That is the
point of the flag in the index. The flag means that there is only one
item in the heap corresponding to that hash value. In this case we
Kenneth Marshall wrote:
How likely is it that you will get a hash collision, two strings that are
different that will hash to the same value? To avoid this requires a very
large hash key (128 bits, minimum)- otherwise you get into birthday attack
problems. With a 32-bit hash, the
Kenneth Marshall wrote:
I understand that a hash value is a many-to-one mapping. That is the
point of the flag in the index. The flag means that there is only one
item in the heap corresponding to that hash value. In this case we
know that the value in the heap is the correct one and a possibly
On Thu, Sep 06, 2007 at 01:08:59PM -0500, Kenneth Marshall wrote:
Since we already have to check the actual tuple values for any index
lookup in postgresql, we could only store the full hash value and the
corresponding TIDs in the bucket. Then when we lookup an item by
calculating its hash, if
Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane:
Gregory Stark [EMAIL PROTECTED] writes:
Kenneth Marshall [EMAIL PROTECTED] writes:
- What about multi-column indexes? The current implementation
only supports 1 column.
That seems kind of weird. It seems obvious that you
Hannu Krosing [EMAIL PROTECTED] writes:
Ãhel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane:
No, because part of the deal is that you can do lookups using only the
leading index columns. At least, all the existing multicolumn index
types can do that.
One approahc is not to mix
Hannu Krosing wrote:
One approahc is not to mix hashes, but to partition the hash, so that
each column gets its N bits in the hash.
How does that help? You still need all the keys to find out which
bucket to look in.
no. you need to look at only the buckets where that part of
On Sep 6, 2007, at 10:53 , Mark Mielke wrote:
I don't like the truncating hash suggestion because it limits the
ability of a hash code to uniquely identify a key.
AIUI, a hash can't be used as a unique identifier: it always needs to
be rechecked due to the chance of collisions. There
Ühel kenal päeval, N, 2007-09-06 kell 09:38, kirjutas Tom Lane:
Hannu Krosing [EMAIL PROTECTED] writes:
Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane:
No, because part of the deal is that you can do lookups using only the
leading index columns. At least, all the existing
On Thu, Sep 06, 2007 at 11:53:45AM -0400, Mark Mielke wrote:
Hannu Krosing wrote:
One approahc is not to mix hashes, but to partition the hash, so that
each column gets its N bits in the hash.
How does that help? You still need all the keys to find out which
bucket to look in.
Michael Glaesemann wrote:
On Sep 6, 2007, at 10:53 , Mark Mielke wrote:
I don't like the truncating hash suggestion because it limits the
ability of a hash code to uniquely identify a key.
AIUI, a hash can't be used as a unique identifier: it always needs to
be rechecked due to the chance of
On Sun, Sep 02, 2007 at 01:04:04PM -0500, Kenneth Marshall wrote:
Dear PostgreSQL Hackers:
After following the hackers mailing list for quite a while,
I am going to start investigating what will need to be done
to improve hash index performance. Below are the pieces of
this project that I
On Sun, 2007-09-02 at 13:04 -0500, Kenneth Marshall wrote:
Dear PostgreSQL Hackers:
After following the hackers mailing list for quite a while,
I am going to start investigating what will need to be done
to improve hash index performance. Below are the pieces of
this project that I am
On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
Kenneth Marshall [EMAIL PROTECTED] writes:
... This is the rough plan. Does anyone see anything critical that
is missing at this point?
Sounds pretty good. Let me brain-dump one item on you: one thing that
hash currently has over
On Mon, Sep 03, 2007 at 10:33:54AM +0100, Simon Riggs wrote:
This is the rough plan. Does anyone see anything critical that
is missing at this point? Please send me any suggestions for test
data and various performance test ideas, since I will be working
on that first.
Sounds good.
Kenneth Marshall [EMAIL PROTECTED] writes:
On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
Kenneth Marshall [EMAIL PROTECTED] writes:
... This is the rough plan. Does anyone see anything critical that
is missing at this point?
Sounds pretty good. Let me brain-dump one item on
Gregory Stark [EMAIL PROTECTED] writes:
Kenneth Marshall [EMAIL PROTECTED] writes:
- What about multi-column indexes? The current implementation
only supports 1 column.
That seems kind of weird. It seems obvious that you mix the three hashes
together which reduces it to the solved problem.
On 9/3/07, Gregory Stark [EMAIL PROTECTED] wrote:
Kenneth Marshall [EMAIL PROTECTED] writes:
On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
Kenneth Marshall [EMAIL PROTECTED] writes:
... This is the rough plan. Does anyone see anything critical that
is missing at this
On Mon, Sep 03, 2007 at 05:20:34PM -0700, Ben Tilly wrote:
That raises a very random thought. One of the nicer features of
Oracle is the ability to have function-based indexes. So you could
index, say, trim(lower(person.name)). There are a *lot* of practical
situations where that comes in
Ben Tilly [EMAIL PROTECTED] writes:
That raises a very random thought. One of the nicer features of
Oracle is the ability to have function-based indexes. So you could
index, say, trim(lower(person.name)).
Is there any prospect of postgres aquiring that functionality?
Uh, no, since it's
On 9/3/07, Tom Lane [EMAIL PROTECTED] wrote:
Ben Tilly [EMAIL PROTECTED] writes:
That raises a very random thought. One of the nicer features of
Oracle is the ability to have function-based indexes. So you could
index, say, trim(lower(person.name)).
Is there any prospect of postgres
Dear PostgreSQL Hackers:
After following the hackers mailing list for quite a while,
I am going to start investigating what will need to be done
to improve hash index performance. Below are the pieces of
this project that I am currently considering:
1. Characterize the current hash index
Kenneth Marshall [EMAIL PROTECTED] writes:
... This is the rough plan. Does anyone see anything critical that
is missing at this point?
Sounds pretty good. Let me brain-dump one item on you: one thing that
hash currently has over btree is the ability to handle index items up
to a full page.
60 matches
Mail list logo