Re: qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-03-21 Thread Tom Lane
Last month I wrote: > It seems clear that our qsort.c is doing a pretty awful job of picking > qsort pivots, while glibc is mostly managing not to make that mistake. I re-ran Gary's test script using the just-committed improvements to qsort.c, and got pretty nice numbers (attached --- compare to h

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-03-02 Thread Bruce Momjian
gt; > Subject: Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create > Index > > behaviour) > > > > Ron <[EMAIL PROTECTED]> writes: > > > How are we choosing our pivots? > > > > See qsort.c: it looks like median of nine equally spaced input

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-17 Thread Dann Corbit
> -Original Message- > From: [EMAIL PROTECTED] [mailto:pgsql-hackers- > [EMAIL PROTECTED] On Behalf Of Markus Schaber > Sent: Thursday, February 16, 2006 5:45 AM > To: pgsql-performance@postgresql.org; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] qsort again

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-17 Thread Jonah H. Harris
Last night I implemented a non-recursive introsort in C... let me test it a bit more and then I'll post it here for everyone else to try out.On 2/16/06, Markus Schaber <[EMAIL PROTECTED]> wrote:Hi, Ron, Ron wrote:> ...and of course if you know enough about the data to be sorted so as to> constrain

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-17 Thread Dann Corbit
> -Original Message- > From: [EMAIL PROTECTED] [mailto:pgsql-hackers- > [EMAIL PROTECTED] On Behalf Of Tom Lane > Sent: Wednesday, February 15, 2006 5:22 PM > To: Ron > Cc: pgsql-performance@postgresql.org; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-17 Thread Tom Lane
Mark Lewis <[EMAIL PROTECTED]> writes: > I think we're actually on the same page here; you're right that the > constraint above ( f(a)==f(b) iff a==b ) can't be extended to data types > with more than 32 bits of value space. But the constraint I listed was > actually: > if a==b then f(a)==f(b) I

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-17 Thread Mark Lewis
On Thu, 2006-02-16 at 21:33 -0800, David Lang wrote: > > In SQL_ASCII, just take the first 4 characters (or 8, if using a 64-bit > > sortKey as elsewhere suggested). The sorting key doesn't need to be a > > one-to-one mapping. > > that would violate your second contraint ( f(a)==f(b) iff (a==b) )

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-17 Thread Martijn van Oosterhout
On Fri, Feb 17, 2006 at 08:18:41AM -0800, Scott Lamb wrote: > On Feb 16, 2006, at 2:17 PM, Mark Lewis wrote: > >Data types which could probably provide a useful function for f > >would be > >int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII). > > ...and with some work, floats (

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-17 Thread Scott Lamb
On Feb 16, 2006, at 2:17 PM, Mark Lewis wrote:Data types which could probably provide a useful function for f would be int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII). ...and with some work, floats (I think just the exponent would work, if nothing else). bytea. Probably just ab

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-17 Thread Markus Schaber
Hi, David, David Lang schrieb: >> In SQL_ASCII, just take the first 4 characters (or 8, if using a 64-bit >> sortKey as elsewhere suggested). The sorting key doesn't need to be a >> one-to-one mapping. > that would violate your second contraint ( f(a)==f(b) iff (a==b) ) no, it doesn't. When b

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread David Lang
On Thu, 16 Feb 2006, Mark Lewis wrote: On Thu, 2006-02-16 at 17:51 -0500, Greg Stark wrote: Data types which could probably provide a useful function for f would be int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII). How exactly do you imagine doing this for text? I could s

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Steinar H. Gunderson
On Fri, Feb 17, 2006 at 12:05:23AM +0100, PFC wrote: > I would have said a 64 bit int, but it's the same idea. However it > won't work for floats, which is a pity, because floats fit in 64 bits. Actually, you can compare IEEE floats directly as ints, as long as they're positive. (If

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Markus Schaber
Hi, PFC, PFC schrieb: > By the way, I'd like to declare my zipcode columns as SQL_ASCII > while the rest of my database is in UNICODE, so they are faster to > index and sort. Come on, MySQL does it... Another use case for parametric column definitions - charset definitions - and the first

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Mark Lewis
On Thu, 2006-02-16 at 17:51 -0500, Greg Stark wrote: > > > Data types which could probably provide a useful function for f would be > > > int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII). > > How exactly do you imagine doing this for text? > > I could see doing it for char(n)/

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread PFC
It seems that instead of maintaining a different sorting code path for each data type, you could get away with one generic path and one (hopefully faster) path if you allowed data types to optionally support a 'sortKey' interface by providing a function f which maps inputs to 32- bit int output

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Greg Stark
Markus Schaber <[EMAIL PROTECTED]> writes: > Hmm, to remove redundancy, I'd change the <= to a < and define: > > if a==b then f(a)==f(b) > if a > > Data types which could probably provide a useful function for f would be > > int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII).

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Martijn van Oosterhout
On Thu, Feb 16, 2006 at 02:17:36PM -0800, Mark Lewis wrote: > It seems that instead of maintaining a different sorting code path for > each data type, you could get away with one generic path and one > (hopefully faster) path if you allowed data types to optionally support > a 'sortKey' interface b

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Markus Schaber
Hi, Mark, Mark Lewis schrieb: > It seems that instead of maintaining a different sorting code path for > each data type, you could get away with one generic path and one > (hopefully faster) path if you allowed data types to optionally support > a 'sortKey' interface by providing a function f whi

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Mark Lewis
On Thu, 2006-02-16 at 12:15 -0500, Tom Lane wrote: > Once or twice we've kicked around the idea of having some > datatype-specific sorting code paths alongside the general-purpose one, > but I can't honestly see this as being workable from a code maintenance > standpoint. > >

Re: qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Neil Conway
On Thu, 2006-02-16 at 12:35 +0100, Steinar H. Gunderson wrote: > glibc-2.3.5/stdlib/qsort.c: > > /* Order size using quicksort. This implementation incorporates > four optimizations discussed in Sedgewick: > > I can't see any references to merge sort in there at all. stdlib/qsort.c defin

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-16 Thread Tom Lane
"Gary Doades" <[EMAIL PROTECTED]> writes: > I think the reason I wasn't seeing performance issues with normal sort > operations is because they use work_mem not maintenance_work_mem which was > only set to 2048 anyway. Does that sound right? Very probable. Do you want to test the theory by jackin

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Tom Lane
"Craig A. James" <[EMAIL PROTECTED]> writes: > You can also use this trick when the optimizer is asked for "fastest first > result." Say you have a cursor on a column of numbers with good > distribution. If you do a bucket sort on the first two or three digits only, > you know the first "page"

Re: qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Craig A. James
Markus Schaber wrote: Ron wrote: ...and of course if you know enough about the data to be sorted so as to constrain it appropriately, one should use a non comparison based O(N) sorting algorithm rather than any of the general comparison based O(NlgN) methods. Sounds interesting, could you give

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Gary Doades
> "Gary Doades" <[EMAIL PROTECTED]> writes: >> I think the reason I wasn't seeing performance issues with normal sort >> operations is because they use work_mem not maintenance_work_mem which >> was >> only set to 2048 anyway. Does that sound right? > > Very probable. Do you want to test the theor

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Martijn van Oosterhout
On Thu, Feb 16, 2006 at 08:22:55AM -0500, Ron wrote: > 3= Especially in modern systems where the gap between internal CPU > bandwidth and memory bandwidth is so great, the overhead of memory > accesses for comparisons and moves is the majority of the overhead > for both the pivot choosing and th

Re: qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Markus Schaber
Hi, Ron, Ron wrote: > ...and of course if you know enough about the data to be sorted so as to > constrain it appropriately, one should use a non comparison based O(N) > sorting algorithm rather than any of the general comparison based > O(NlgN) methods. Sounds interesting, could you give us som

Re: qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Ron
At 06:35 AM 2/16/2006, Steinar H. Gunderson wrote: On Wed, Feb 15, 2006 at 11:30:54PM -0500, Ron wrote: > Even better (and more easily scaled as the number of GPR's in the CPU > changes) is to use > the set {L; L+1; L+2; t>>1; R-2; R-1; R} > This means that instead of 7 random memory accesses, we

Re: qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Steinar H. Gunderson
On Wed, Feb 15, 2006 at 11:30:54PM -0500, Ron wrote: > Even better (and more easily scaled as the number of GPR's in the CPU > changes) is to use > the set {L; L+1; L+2; t>>1; R-2; R-1; R} > This means that instead of 7 random memory accesses, we have 3; two > of which result in a > burst access

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Gary Doades
Tom Lane wrote: > I increased the size of the test case by 10x (basically s/10/100/) > which is enough to push it into the external-sort regime. I get > amazingly stable runtimes now --- I didn't have the patience to run 100 > trials, but in 30 trials I have slowest 11538 msec and fastest

Re: qsort again (was Re: [PERFORM] Strange Create Index

2006-02-15 Thread Ron
At 08:21 PM 2/15/2006, Tom Lane wrote: Ron <[EMAIL PROTECTED]> writes: > How are we choosing our pivots? See qsort.c: it looks like median of nine equally spaced inputs (ie, the 1/8th points of the initial input array, plus the end points), implemented as two rounds of median-of-three choices.

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-15 Thread Neil Conway
On Wed, 2006-02-15 at 18:28 -0500, Tom Lane wrote: > It seems clear that our qsort.c is doing a pretty awful job of picking > qsort pivots, while glibc is mostly managing not to make that mistake. > I haven't looked at the glibc code yet to see what they are doing > differently. glibc qsort is act

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-15 Thread Simon Riggs
On Wed, 2006-02-15 at 19:59 -0500, Tom Lane wrote: > I get > amazingly stable runtimes now --- I didn't have the patience to run 100 > trials, but in 30 trials I have slowest 11538 msec and fastest 11144 msec. > So this code path is definitely not very sensitive to this data > distribution. "The

Re: qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Christopher Kings-Lynne
Ouch! That confirms my problem. I generated the random test case because it was easier than including the dump of my tables, but you can appreciate that tables 20 times the size are basically crippled when it comes to creating an index on them. I have to say that I restored a few gigabyte dum

Re: qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Tom Lane
Ron <[EMAIL PROTECTED]> writes: > How are we choosing our pivots? See qsort.c: it looks like median of nine equally spaced inputs (ie, the 1/8th points of the initial input array, plus the end points), implemented as two rounds of median-of-three choices. With half of the data inputs zero, it's n

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Tom Lane
I wrote: > Gary Doades <[EMAIL PROTECTED]> writes: >> Ouch! That confirms my problem. I generated the random test case because >> it was easier than including the dump of my tables, but you can >> appreciate that tables 20 times the size are basically crippled when it >> comes to creating an ind

Re: qsort again (was Re: [PERFORM] Strange Create Index

2006-02-15 Thread Ron
This behavior is consistent with the pivot choosing algorithm assuming certain distribution(s) for the data. For instance, median-of-three partitioning is known to be pessimal when the data is geometrically or hyper-geometrically distributed. Also, care must be taken that sometimes is not whe

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Tom Lane
Gary Doades <[EMAIL PROTECTED]> writes: > Ouch! That confirms my problem. I generated the random test case because > it was easier than including the dump of my tables, but you can > appreciate that tables 20 times the size are basically crippled when it > comes to creating an index on them. Ac

Re: qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Tom Lane
Gary Doades <[EMAIL PROTECTED]> writes: > Is this likely to hit me in a random fashion during normal operation, > joins, sorts, order by for example? Yup, anytime you're passing data with that kind of distribution through a sort. > So the options are: > 1) Fix the included qsort.c code and use t

Re: qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Gary Doades
Tom Lane wrote: For some reason I hadn't immediately twigged to the fact that your test script is just N repetitions of the exact same structure with random data. So it's not so surprising that you get random variations in behavior with different test data sets. > It seems clear that our qsort

qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-15 Thread Tom Lane
Gary Doades <[EMAIL PROTECTED]> writes: > If I run the script again, it is not always the first case that is slow, > it varies from run to run, which is why I repeated it quite a few times > for the test. For some reason I hadn't immediately twigged to the fact that your test script is just N re