:
-Original Message-
From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
[EMAIL PROTECTED] On Behalf Of Tom Lane
Sent: Wednesday, February 15, 2006 5:22 PM
To: Ron
Cc: pgsql-performance@postgresql.org; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] qsort again (was Re: [PERFORM
Hi, David,
David Lang schrieb:
In SQL_ASCII, just take the first 4 characters (or 8, if using a 64-bit
sortKey as elsewhere suggested). The sorting key doesn't need to be a
one-to-one mapping.
that would violate your second contraint ( f(a)==f(b) iff (a==b) )
no, it doesn't.
When both
Hi, Ron,
Ron schrieb:
OK, so here's _a_ way (there are others) to obtain a mapping such that
if a b then f(a) f (b) and
if a == b then f(a) == f(b)
Pretend each row is a integer of row size (so a 2KB row becomes a 16Kb
integer; a 4KB row becomes a 32Kb integer; etc)
Since even a 1TB
At 05:19 AM 2/17/2006, Markus Schaber wrote:
Hi, Ron,
Ron schrieb:
OK, so here's _a_ way (there are others) to obtain a mapping such that
if a b then f(a) f (b) and
if a == b then f(a) == f(b)
Pretend each row is a integer of row size (so a 2KB row becomes a 16Kb
integer; a 4KB row
On Fri, Feb 17, 2006 at 08:23:40AM -0500, Ron wrote:
For this mapping, you need a full table sort.
One physical IO pass should be all that's needed. However, let's
pretend you are correct and that we do need to sort the table to get
the key mapping. Even so, we would only need to do it
On Feb 16, 2006, at 2:17 PM, Mark Lewis wrote:Data types which could probably provide a useful function for f would be int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII). ...and with some work, floats (I think just the exponent would work, if nothing else). bytea. Probably just
On Fri, Feb 17, 2006 at 08:18:41AM -0800, Scott Lamb wrote:
On Feb 16, 2006, at 2:17 PM, Mark Lewis wrote:
Data types which could probably provide a useful function for f
would be
int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII).
...and with some work, floats (I think
At 10:53 AM 2/17/2006, Martijn van Oosterhout wrote:
On Fri, Feb 17, 2006 at 08:23:40AM -0500, Ron wrote:
For this mapping, you need a full table sort.
One physical IO pass should be all that's needed. However, let's
pretend you are correct and that we do need to sort the table to get
the
On fös, 2006-02-17 at 08:01 -0500, Ron wrote:
At 04:24 AM 2/17/2006, Ragnar wrote:
On fös, 2006-02-17 at 01:20 -0500, Ron wrote:
OK, so here's _a_ way (there are others) to obtain a mapping such that
if a b then f(a) f (b) and
if a == b then f(a) == f(b)
By scanning the
Last night I implemented a non-recursive introsort in C... let me test it a bit more and then I'll post it here for everyone else to try out.On 2/16/06, Markus Schaber
[EMAIL PROTECTED] wrote:Hi, Ron,
Ron wrote: ...and of course if you know enough about the data to be sorted so as to constrain it
-Original Message-
From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
[EMAIL PROTECTED] On Behalf Of Markus Schaber
Sent: Thursday, February 16, 2006 5:45 AM
To: pgsql-performance@postgresql.org; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] qsort again (was Re: [PERFORM] Strange
On 2/17/06, Ragnar [EMAIL PROTECTED] wrote:
Say again ?
Let us say you have 1 billion rows, where the
column in question contains strings like
baaaaaa
baaaaab
baaaaac
...
not necessarily in this order on disc of course
The minimum value
Tom Lane wrote:
I increased the size of the test case by 10x (basically s/10/100/)
which is enough to push it into the external-sort regime. I get
amazingly stable runtimes now --- I didn't have the patience to run 100
trials, but in 30 trials I have slowest 11538 msec and fastest
On Thu, Feb 16, 2006 at 08:22:55AM -0500, Ron wrote:
3= Especially in modern systems where the gap between internal CPU
bandwidth and memory bandwidth is so great, the overhead of memory
accesses for comparisons and moves is the majority of the overhead
for both the pivot choosing and the
Gary Doades [EMAIL PROTECTED] writes:
I think the reason I wasn't seeing performance issues with normal sort
operations is because they use work_mem not maintenance_work_mem which
was
only set to 2048 anyway. Does that sound right?
Very probable. Do you want to test the theory by jacking
At 09:48 AM 2/16/2006, Martijn van Oosterhout wrote:
On Thu, Feb 16, 2006 at 08:22:55AM -0500, Ron wrote:
3= Especially in modern systems where the gap between internal CPU
bandwidth and memory bandwidth is so great, the overhead of memory
accesses for comparisons and moves is the majority of
Ron [EMAIL PROTECTED] writes:
Your cost comment basically agrees with mine regarding the cost of
random memory accesses. The good news is that the number of datums
to be examined during the pivot choosing process is small enough that
the datums can fit into CPU cache while the pointers to
At 10:52 AM 2/16/2006, Ron wrote:
At 09:48 AM 2/16/2006, Martijn van Oosterhout wrote:
Where this does become interesting is where we can convert a datum to
an integer such that if f(A) f(B) then A B. Then we can sort on
f(X) first with just integer comparisons and then do a full tuple
On Thu, Feb 16, 2006 at 11:32:55AM -0500, Ron wrote:
At 10:52 AM 2/16/2006, Ron wrote:
In fact we can do better.
Using hash codes or what-not to map datums to keys and then sorting
just the keys and the pointers to those datums followed by an
optional final pass where we do the actual data
Craig A. James [EMAIL PROTECTED] writes:
You can also use this trick when the optimizer is asked for fastest first
result. Say you have a cursor on a column of numbers with good
distribution. If you do a bucket sort on the first two or three digits only,
you know the first page of results
Gary Doades [EMAIL PROTECTED] writes:
I think the reason I wasn't seeing performance issues with normal sort
operations is because they use work_mem not maintenance_work_mem which was
only set to 2048 anyway. Does that sound right?
Very probable. Do you want to test the theory by jacking that
On Feb 16, 2006, at 8:32 AM, Ron wrote:
Let's pretend that we have the typical DB table where rows are
~2-4KB apiece. 1TB of storage will let us have 256M-512M rows in
such a table.
A 32b hash code can be assigned to each row value such that only
exactly equal rows will have the same
At 12:19 PM 2/16/2006, Scott Lamb wrote:
On Feb 16, 2006, at 8:32 AM, Ron wrote:
Let's pretend that we have the typical DB table where rows are
~2-4KB apiece. 1TB of storage will let us have 256M-512M rows in
such a table.
A 32b hash code can be assigned to each row value such that only
On Thu, 2006-02-16 at 12:15 -0500, Tom Lane wrote:
Once or twice we've kicked around the idea of having some
datatype-specific sorting code paths alongside the general-purpose one,
but I can't honestly see this as being workable from a code maintenance
standpoint.
Hi, Mark,
Mark Lewis schrieb:
It seems that instead of maintaining a different sorting code path for
each data type, you could get away with one generic path and one
(hopefully faster) path if you allowed data types to optionally support
a 'sortKey' interface by providing a function f which
On Thu, Feb 16, 2006 at 02:17:36PM -0800, Mark Lewis wrote:
It seems that instead of maintaining a different sorting code path for
each data type, you could get away with one generic path and one
(hopefully faster) path if you allowed data types to optionally support
a 'sortKey' interface by
Markus Schaber [EMAIL PROTECTED] writes:
Hmm, to remove redundancy, I'd change the = to a and define:
if a==b then f(a)==f(b)
if ab then f(a)=f(b)
Data types which could probably provide a useful function for f would be
int2, int4, oid, and possibly int8 and text (at least for
It seems that instead of maintaining a different sorting code path for
each data type, you could get away with one generic path and one
(hopefully faster) path if you allowed data types to optionally support
a 'sortKey' interface by providing a function f which maps inputs to 32-
bit int
On Thu, 2006-02-16 at 17:51 -0500, Greg Stark wrote:
Data types which could probably provide a useful function for f would be
int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII).
How exactly do you imagine doing this for text?
I could see doing it for
Hi, PFC,
PFC schrieb:
By the way, I'd like to declare my zipcode columns as SQL_ASCII
while the rest of my database is in UNICODE, so they are faster to
index and sort. Come on, MySQL does it...
Another use case for parametric column definitions - charset definitions
- and the first
On Fri, Feb 17, 2006 at 12:05:23AM +0100, PFC wrote:
I would have said a 64 bit int, but it's the same idea. However it
won't work for floats, which is a pity, because floats fit in 64 bits.
Actually, you can compare IEEE floats directly as ints, as long as they're
positive. (If
On Thu, 16 Feb 2006, Mark Lewis wrote:
On Thu, 2006-02-16 at 17:51 -0500, Greg Stark wrote:
Data types which could probably provide a useful function for f would be
int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII).
How exactly do you imagine doing this for text?
I could
At 01:47 PM 2/16/2006, Ron wrote:
At 12:19 PM 2/16/2006, Scott Lamb wrote:
On Feb 16, 2006, at 8:32 AM, Ron wrote:
Let's pretend that we have the typical DB table where rows are
~2-4KB apiece. 1TB of storage will let us have 256M-512M rows in
such a table.
A 32b hash code can be assigned to
I wrote:
Gary Doades [EMAIL PROTECTED] writes:
Ouch! That confirms my problem. I generated the random test case because
it was easier than including the dump of my tables, but you can
appreciate that tables 20 times the size are basically crippled when it
comes to creating an index on
On Wed, 2006-02-15 at 19:59 -0500, Tom Lane wrote:
I get
amazingly stable runtimes now --- I didn't have the patience to run 100
trials, but in 30 trials I have slowest 11538 msec and fastest 11144 msec.
So this code path is definitely not very sensitive to this data
distribution.
The
On Wed, 2006-02-15 at 18:28 -0500, Tom Lane wrote:
It seems clear that our qsort.c is doing a pretty awful job of picking
qsort pivots, while glibc is mostly managing not to make that mistake.
I haven't looked at the glibc code yet to see what they are doing
differently.
glibc qsort is
36 matches
Mail list logo