Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-23 Thread Robert Haas
On Wed, Jan 21, 2015 at 2:22 AM, Peter Geoghegan p...@heroku.com wrote: You'll probably prefer the attached. This patch works by disabling abbreviation, but only after writing out runs, with the final merge left to go. That way, it doesn't matter when abbreviated keys are not read back from

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-23 Thread Robert Haas
On Fri, Jan 23, 2015 at 2:18 AM, David Rowley dgrowle...@gmail.com wrote: On 20 January 2015 at 17:10, Peter Geoghegan p...@heroku.com wrote: On Mon, Jan 19, 2015 at 7:47 PM, Michael Paquier michael.paqu...@gmail.com wrote: With your patch applied, the failure with MSVC disappeared, but

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-22 Thread David Rowley
On 20 January 2015 at 17:10, Peter Geoghegan p...@heroku.com wrote: On Mon, Jan 19, 2015 at 7:47 PM, Michael Paquier michael.paqu...@gmail.com wrote: With your patch applied, the failure with MSVC disappeared, but there is still a warning showing up: (ClCompile target) -

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-21 Thread Andrew Gierth
Peter == Peter Geoghegan p...@heroku.com writes: Peter You'll probably prefer the attached. This patch works by Peter disabling abbreviation, but only after writing out runs, with Peter the final merge left to go. That way, it doesn't matter when Peter abbreviated keys are not read back from

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-21 Thread Andrew Gierth
Peter == Peter Geoghegan p...@heroku.com writes: Peter Basically, the intersection of the datum sort case with Peter abbreviated keys seems complicated. Not to me. To me it seems completely trivial. Now, I follow this general principle that someone who is not doing the work should never say

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Andrew Gierth
Robert == Robert Haas robertmh...@gmail.com writes: Robert All right, it seems Tom is with you on that point, so after Robert some study, I've committed this with very minor modifications. This caught my eye (thanks to conflict with GS patch): * In the future, we should consider forcing the

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Robert Haas
On Mon, Jan 19, 2015 at 9:29 PM, Peter Geoghegan p...@heroku.com wrote: I think that the attached patch should at least fix that much. Maybe the problem on the other animal is also explained by the lack of this, since there could also be a MinGW-ish strxfrm_l(), I suppose. Committed that,

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan
On Tue, Jan 20, 2015 at 3:34 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Jan 20, 2015 at 3:34 PM, Robert Haas robertmh...@gmail.com wrote: Dear me. Peter, can you fix this RSN? Investigating. It's certainly possible to fix Andrew's test case with the attached. I'm not sure that that's

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Robert Haas
On Tue, Jan 20, 2015 at 6:27 PM, Andrew Gierth and...@tao11.riddles.org.uk wrote: Robert == Robert Haas robertmh...@gmail.com writes: Robert All right, it seems Tom is with you on that point, so after Robert some study, I've committed this with very minor modifications. While hacking up a

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan
On Tue, Jan 20, 2015 at 3:46 AM, Andrew Gierth and...@tao11.riddles.org.uk wrote: The comment in tuplesort_begin_datum that abbreviation can't be used seems wrong to me; why is the copy of the original value pointed to by stup-tuple (in the case of by-reference types, and abbreviation is

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Andrew Gierth
Robert == Robert Haas robertmh...@gmail.com writes: Robert All right, it seems Tom is with you on that point, so after Robert some study, I've committed this with very minor modifications. While hacking up a patch to demonstrate the simplicity of extending this to the Datum sorter, I seem to

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan
On Tue, Jan 20, 2015 at 2:00 PM, Peter Geoghegan p...@heroku.com wrote: Maybe that's the wrong way of fixing that, but for now I don't think it's acceptable that abbreviation isn't always used in certain cases where it could make sense (e.g. not for simple GroupAggregates with a single

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan
On Tue, Jan 20, 2015 at 3:34 PM, Robert Haas robertmh...@gmail.com wrote: Dear me. Peter, can you fix this RSN? Investigating. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan
On Tue, Jan 20, 2015 at 3:33 PM, Robert Haas robertmh...@gmail.com wrote: Peter, this made bowerbird (Windows 8/Visual Studio) build, but it's failing make check. Ditto hamerkop (Windows 2k8/VC++) and currawong (Windows XP Pro/MSVC++). jacana (Windows 8/gcc) and brolga (Windows XP

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan
On Tue, Jan 20, 2015 at 3:57 PM, Peter Geoghegan p...@heroku.com wrote: It's certainly possible to fix Andrew's test case with the attached. I'm not sure that that's the appropriate fix, though: there is probably a case to be made for not bothering with abbreviation once we've read tuples in

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Robert Haas
On Tue, Jan 20, 2015 at 10:54 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Jan 19, 2015 at 9:29 PM, Peter Geoghegan p...@heroku.com wrote: I think that the attached patch should at least fix that much. Maybe the problem on the other animal is also explained by the lack of this, since

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan
On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas robertmh...@gmail.com wrote: I was assuming we were going to fix this by undoing the abbreviation (as in the abort case) when we spill to disk, and not bothering with it thereafter. The spill-to-disk case is at least as compelling at the internal

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Robert Haas
On Tue, Jan 20, 2015 at 9:33 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Jan 20, 2015 at 6:30 PM, Robert Haas robertmh...@gmail.com wrote: I don't want to change the on-disk format for tapes without a lot more discussion. Can you come up with a fix that avoids that for now? A more

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Robert Haas
On Tue, Jan 20, 2015 at 8:39 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas robertmh...@gmail.com wrote: I was assuming we were going to fix this by undoing the abbreviation (as in the abort case) when we spill to disk, and not bothering with it

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan
On Tue, Jan 20, 2015 at 6:30 PM, Robert Haas robertmh...@gmail.com wrote: I don't want to change the on-disk format for tapes without a lot more discussion. Can you come up with a fix that avoids that for now? A more conservative approach would be to perform conversion on-the-fly once more.

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Robert Haas
On Tue, Jan 20, 2015 at 7:07 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Jan 20, 2015 at 3:57 PM, Peter Geoghegan p...@heroku.com wrote: It's certainly possible to fix Andrew's test case with the attached. I'm not sure that that's the appropriate fix, though: there is probably a case to

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan
On Tue, Jan 20, 2015 at 6:39 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Jan 20, 2015 at 6:34 PM, Robert Haas robertmh...@gmail.com wrote: That might be OK. Probably needs a bit of performance testing to see how it looks. Well, we're still only doing it when we do our final merge. So

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan
On Tue, Jan 20, 2015 at 5:42 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jan 20, 2015 at 8:39 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas robertmh...@gmail.com wrote: I was assuming we were going to fix this by undoing the abbreviation (as

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan
On Tue, Jan 20, 2015 at 5:46 PM, Peter Geoghegan p...@heroku.com wrote: Would you prefer it if the spill-to-disk case aborted in the style of low entropy keys? That doesn't seem significantly safer than this, and it certainly not acceptable from a performance perspective. BTW, I can write

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Robert Haas
On Tue, Jan 20, 2015 at 8:39 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas robertmh...@gmail.com wrote: I was assuming we were going to fix this by undoing the abbreviation (as in the abort case) when we spill to disk, and not bothering with it

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan
On Tue, Jan 20, 2015 at 6:34 PM, Robert Haas robertmh...@gmail.com wrote: That might be OK. Probably needs a bit of performance testing to see how it looks. Well, we're still only doing it when we do our final merge. So that's only doubling the number of conversions required, which if we're

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-19 Thread Michael Paquier
On Tue, Jan 20, 2015 at 11:29 AM, Peter Geoghegan p...@heroku.com wrote: On Mon, Jan 19, 2015 at 5:59 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Jan 19, 2015 at 5:33 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: You did notice that bowerbird isn't building, right?

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-19 Thread Alvaro Herrera
Peter Geoghegan wrote: It appears that the buildfarm animal brolga isn't happy about this patch. I'm not sure why, since I thought we already figured out bugs or other inconsistencies in various strxfrm() implementations. You did notice that bowerbird isn't building, right?

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-19 Thread Robert Haas
On Mon, Jan 19, 2015 at 5:43 PM, Peter Geoghegan p...@heroku.com wrote: It appears that the buildfarm animal brolga isn't happy about this patch. I'm not sure why, since I thought we already figured out bugs or other inconsistencies in various strxfrm() implementations. Well, the first thing

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-19 Thread Peter Geoghegan
On Mon, Jan 19, 2015 at 7:47 PM, Michael Paquier michael.paqu...@gmail.com wrote: On MinGW-32, not that I know of: $ find . -name *.h | xgrep strxfrm_l ./lib/gcc/mingw32/4.8.1/include/c++/mingw32/bits/c++config.h:/* Define if strxfr m_l is available in string.h. */

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-19 Thread Peter Geoghegan
On Mon, Jan 19, 2015 at 5:59 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Jan 19, 2015 at 5:33 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: You did notice that bowerbird isn't building, right?

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-19 Thread Peter Geoghegan
On Mon, Jan 19, 2015 at 5:33 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: You did notice that bowerbird isn't building, right? http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbirddt=2015-01-19%2023%3A54%3A46 Yeah. Looks like strxfrm_l() isn't available on the animal, for

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-19 Thread Robert Haas
On Tue, Dec 2, 2014 at 8:28 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Dec 2, 2014 at 2:16 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Dec 2, 2014 at 2:07 PM, Robert Haas robertmh...@gmail.com wrote: Well, maybe you should make the updates we've agreed on and I can take another

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-19 Thread Robert Haas
On Mon, Jan 19, 2015 at 3:33 PM, Robert Haas robertmh...@gmail.com wrote: All right, it seems Tom is with you on that point, so after some study, I've committed this with very minor modifications. Sorry for the long delay. I have not committed the 0002 patch, though, because I haven't

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-19 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote: On the PPC64 machine I normally use for performance testing, it takes about 6.3 seconds to build the index with the commit just before this one. With this commit, it drops to 1.9 seconds. That's more than a 3x speedup! Now, if I change the

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-12-09 Thread Peter Geoghegan
There is an interesting thread about strcoll() overhead over on -general: http://www.postgresql.org/message-id/cab25xexnondrmc1_cy3jvmb0tmydm38ef9q2d7xla0rbncj...@mail.gmail.com My guess was that this person experienced a rather unexpected downside of spilling to disk when sorting on a text

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-12-03 Thread Robert Haas
On Tue, Dec 2, 2014 at 5:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Geoghegan p...@heroku.com writes: On Tue, Dec 2, 2014 at 2:21 PM, Robert Haas robertmh...@gmail.com wrote: Right, and what I'm saying is that maybe the applicability flag shouldn't be stored in the SortSupport object, but

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-12-03 Thread Peter Geoghegan
On Tue, Dec 2, 2014 at 1:21 PM, Peter Geoghegan p...@heroku.com wrote: Incidentally, I think that an under-appreciated possible source of regressions here is that attributes abbreviated have a strong physical/logical correlation. I could see a small regression for one such case even though my

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-12-02 Thread Robert Haas
On Tue, Nov 25, 2014 at 1:38 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Nov 25, 2014 at 4:01 AM, Robert Haas robertmh...@gmail.com wrote: - This appears to needlessly reindent the comments for PG_CACHE_LINE_SIZE. Actually, the word only is removed (because PG_CACHE_LINE_SIZE has a new

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-12-02 Thread Peter Geoghegan
On Tue, Dec 2, 2014 at 1:00 PM, Robert Haas robertmh...@gmail.com wrote: I'd prefer not to have a #define in pg_config_manual.h. Only stuff that we expect a reasonably decent number of users to need to change should be in that file, and this is too marginal for that. If anybody other than

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-12-02 Thread Robert Haas
On Tue, Dec 2, 2014 at 4:21 PM, Peter Geoghegan p...@heroku.com wrote: I'm not sure about that. I'd prefer to have tuplesort (and one or two other sites) set the abbreviation is possible in principle flag. Otherwise, sortsupport needs to assume that the leading attribute is going to be the

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-12-02 Thread Peter Geoghegan
On Tue, Dec 2, 2014 at 2:07 PM, Robert Haas robertmh...@gmail.com wrote: Well, maybe you should make the updates we've agreed on and I can take another look at it. Agreed. But I didn't think that I was proposing to change anything about the level at which the decision about whether to

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-12-02 Thread Robert Haas
On Tue, Dec 2, 2014 at 5:16 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Dec 2, 2014 at 2:07 PM, Robert Haas robertmh...@gmail.com wrote: Well, maybe you should make the updates we've agreed on and I can take another look at it. Agreed. But I didn't think that I was proposing to

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-12-02 Thread Peter Geoghegan
On Tue, Dec 2, 2014 at 2:21 PM, Robert Haas robertmh...@gmail.com wrote: Right, and what I'm saying is that maybe the applicability flag shouldn't be stored in the SortSupport object, but passed down as an argument. But then how does that information get to any given sortsupport routine?

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-12-02 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes: On Tue, Dec 2, 2014 at 2:21 PM, Robert Haas robertmh...@gmail.com wrote: Right, and what I'm saying is that maybe the applicability flag shouldn't be stored in the SortSupport object, but passed down as an argument. But then how does that information

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-12-02 Thread Peter Geoghegan
On Tue, Dec 2, 2014 at 2:16 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Dec 2, 2014 at 2:07 PM, Robert Haas robertmh...@gmail.com wrote: Well, maybe you should make the updates we've agreed on and I can take another look at it. Agreed. Attached, revised patchset makes these updates. I

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-12-02 Thread Peter Geoghegan
On Tue, Dec 2, 2014 at 5:28 PM, Peter Geoghegan p...@heroku.com wrote: Attached, revised patchset makes these updates. Whoops. Missed some obsolete comments. Here is a third commit that makes a further small modification to one comment. -- Peter Geoghegan From

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-11-25 Thread Robert Haas
On Sun, Nov 9, 2014 at 10:02 PM, Peter Geoghegan p...@heroku.com wrote: On Sat, Oct 11, 2014 at 6:34 PM, Peter Geoghegan p...@heroku.com wrote: Attached patch, when applied, accelerates all tuplesort cases using abbreviated keys, building on previous work here, as well as the patch posted to

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-11-25 Thread Peter Geoghegan
On Tue, Nov 25, 2014 at 4:01 AM, Robert Haas robertmh...@gmail.com wrote: - This appears to needlessly reindent the comments for PG_CACHE_LINE_SIZE. Actually, the word only is removed (because PG_CACHE_LINE_SIZE has a new client now). So it isn't quite the same paragraph as before. - I really

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-11-25 Thread Peter Geoghegan
On Tue, Nov 25, 2014 at 10:38 AM, Peter Geoghegan p...@heroku.com wrote: - Also, I don't think making abbrev_state an enumerated value with two values is really doing anything for us; we could just use a Boolean. I'm wondering if we should actually go a bit further and remove this from the

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-11-09 Thread Peter Geoghegan
On Sat, Oct 11, 2014 at 6:34 PM, Peter Geoghegan p...@heroku.com wrote: Attached patch, when applied, accelerates all tuplesort cases using abbreviated keys, building on previous work here, as well as the patch posted to that other thread. I attach an updated patch set, rebased on top of the

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-10-11 Thread Peter Geoghegan
On Mon, Sep 29, 2014 at 10:34 PM, Peter Geoghegan p...@heroku.com wrote: single sortsupport state patch. You probably noticed that I posted an independently useful patch to make all tuplesort cases use sortsupport [1] - currently, both the B-Tree and CLUSTER cases do not use the sortsupport

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-25 Thread Robert Haas
On Wed, Sep 24, 2014 at 7:04 PM, Peter Geoghegan p...@heroku.com wrote: On Fri, Sep 19, 2014 at 2:54 PM, Peter Geoghegan p...@heroku.com wrote: Probably not - it appears to make very little difference to unoptimized pass-by-reference types whether or not datum1 can be used (see my simulation

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-25 Thread Peter Geoghegan
On Thu, Sep 25, 2014 at 9:21 AM, Robert Haas robertmh...@gmail.com wrote: The top issue on my agenda is figuring out a way to get rid of the extra SortSupport object. Really? I'm surprised. Clearly the need to restart heap tuple copying from scratch, in order to make the datum1 representation

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-25 Thread Robert Haas
On Thu, Sep 25, 2014 at 2:05 PM, Peter Geoghegan p...@heroku.com wrote: On Thu, Sep 25, 2014 at 9:21 AM, Robert Haas robertmh...@gmail.com wrote: The top issue on my agenda is figuring out a way to get rid of the extra SortSupport object. Really? I'm surprised. Clearly the need to restart

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-25 Thread Peter Geoghegan
On Thu, Sep 25, 2014 at 11:53 AM, Robert Haas robertmh...@gmail.com wrote: I haven't looked at that part of the patch in detail yet, so... not really. But I don't see why you'd ever need to restart heap tuple copying. At most you'd need to re-extract datum1 from the tuples you have already

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-25 Thread Robert Haas
On Thu, Sep 25, 2014 at 3:17 PM, Peter Geoghegan p...@heroku.com wrote: To find out how much that optimization buys, you should use tuples with many variable-length columns (say, 50) preceding the text column you're sorting on. I won't be surprised if that turns out to be expensive enough to

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-24 Thread Peter Geoghegan
On Fri, Sep 19, 2014 at 2:54 PM, Peter Geoghegan p...@heroku.com wrote: Probably not - it appears to make very little difference to unoptimized pass-by-reference types whether or not datum1 can be used (see my simulation of Kevin's worst case, for example [1]). Streaming through a not

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-19 Thread Robert Haas
On Tue, Sep 16, 2014 at 4:55 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Sep 16, 2014 at 1:45 PM, Robert Haas robertmh...@gmail.com wrote: Even though our testing seems to indicate that the memcmp() is basically free, I think it would be good to make the effort to avoid doing memcmp()

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-19 Thread Peter Geoghegan
On Fri, Sep 19, 2014 at 9:59 AM, Robert Haas robertmh...@gmail.com wrote: OK, good point. So committed as-is, then, except that I rewrote the comments, which I felt were excessively long for the amount of code. Thanks! I look forward to hearing your thoughts on the open issues with the patch

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-19 Thread Robert Haas
On Thu, Sep 11, 2014 at 8:34 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Sep 9, 2014 at 2:25 PM, Robert Haas robertmh...@gmail.com wrote: I like that I don't have to care about every combination, and can treat abbreviation abortion as the special case with the extra step, in line with

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-19 Thread Peter Geoghegan
On Fri, Sep 19, 2014 at 2:35 PM, Robert Haas robertmh...@gmail.com wrote: Also, shouldn't you go back and fix up those abbreviated keys to point to datum1 again if you abort? Probably not - it appears to make very little difference to unoptimized pass-by-reference types whether or not datum1

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-16 Thread Robert Haas
On Mon, Sep 15, 2014 at 7:21 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Sep 15, 2014 at 11:25 AM, Peter Geoghegan p...@heroku.com wrote: OK, I'll draft a patch for that today, including similar alterations to varstr_cmp() for the benefit of Windows and so on. I attach a much simpler

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-16 Thread Peter Geoghegan
On Tue, Sep 16, 2014 at 1:45 PM, Robert Haas robertmh...@gmail.com wrote: Even though our testing seems to indicate that the memcmp() is basically free, I think it would be good to make the effort to avoid doing memcmp() and then strcoll() and then strncmp(). Seems like it shouldn't be too

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-15 Thread Heikki Linnakangas
On 09/14/2014 11:34 PM, Peter Geoghegan wrote: On Sun, Sep 14, 2014 at 7:37 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Both values vary in range 5.9 - 6.1 s, so it's fair to say that the useless memcmp() is free with these parameters. Is this the worst case scenario? Other than

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-15 Thread Robert Haas
On Sun, Sep 14, 2014 at 10:37 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 09/13/2014 11:28 PM, Peter Geoghegan wrote: Anyway, attached rough test program implements what you outline. This is for 30,000 32 byte strings (where just the final two bytes differ). On my laptop, output

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-15 Thread Peter Geoghegan
On Mon, Sep 15, 2014 at 10:17 AM, Robert Haas robertmh...@gmail.com wrote: It strikes me that perhaps we should make this change (rearranging things so that the memcmp tiebreak is run before strcoll) first, before dealing with the rest of the abbreviated keys infrastructure. It appears to be a

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-15 Thread Robert Haas
On Mon, Sep 15, 2014 at 1:34 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Sep 15, 2014 at 10:17 AM, Robert Haas robertmh...@gmail.com wrote: It strikes me that perhaps we should make this change (rearranging things so that the memcmp tiebreak is run before strcoll) first, before dealing

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-15 Thread Peter Geoghegan
On Mon, Sep 15, 2014 at 10:53 AM, Robert Haas robertmh...@gmail.com wrote: I think there's probably more than that to work out, but in any case there's no harm in getting a simple optimization done first before moving on to a complicated one. I guess we never talked about the abort logic in

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-15 Thread Robert Haas
On Mon, Sep 15, 2014 at 1:55 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Sep 15, 2014 at 10:53 AM, Robert Haas robertmh...@gmail.com wrote: I think there's probably more than that to work out, but in any case there's no harm in getting a simple optimization done first before moving on

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-15 Thread Peter Geoghegan
On Mon, Sep 15, 2014 at 11:20 AM, Robert Haas robertmh...@gmail.com wrote: ...looks like about a 10-line patch. We have the data to show that the loss is trivial even in the worst case, and we have or should be able to get data showing that the best-case win is significant even without the

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-15 Thread Peter Geoghegan
On Mon, Sep 15, 2014 at 11:25 AM, Peter Geoghegan p...@heroku.com wrote: OK, I'll draft a patch for that today, including similar alterations to varstr_cmp() for the benefit of Windows and so on. I attach a much simpler patch, that only adds an opportunistic memcmp() == 0 before a possible

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-15 Thread Peter Geoghegan
On Mon, Sep 15, 2014 at 4:21 PM, Peter Geoghegan p...@heroku.com wrote: I attach a much simpler patch, that only adds an opportunistic memcmp() == 0 before a possible strcoll(). Both bttextfastcmp_locale() and varstr_cmp() have the optimization added, since there is no point in leaving anyone

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-14 Thread Heikki Linnakangas
On 09/13/2014 11:28 PM, Peter Geoghegan wrote: Anyway, attached rough test program implements what you outline. This is for 30,000 32 byte strings (where just the final two bytes differ). On my laptop, output looks like this (edited to only show median duration in each case): Got to be careful

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-14 Thread Peter Geoghegan
On Sun, Sep 14, 2014 at 7:37 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Got to be careful to not let the compiler optimize away microbenchmarks like this. At least with my version of gcc, the strcoll calls get optimized away, as do the memcmp calls, if you don't use the result for

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-13 Thread Peter Geoghegan
On Fri, Sep 12, 2014 at 11:38 AM, Robert Haas robertmh...@gmail.com wrote: Based on discussion thus far it seems that there's a possibility that the trade-off may be different for short strings vs. long strings. If the string is small enough to fit in the L1 CPU cache, then it may be that

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-12 Thread Heikki Linnakangas
On 09/12/2014 12:46 AM, Peter Geoghegan wrote: On Thu, Sep 11, 2014 at 1:50 PM, Robert Haas robertmh...@gmail.com wrote: I think I said pretty clearly that it was. I agree that you did, but it wasn't clear exactly what factors you were asking me to simulate. All factors. Do you want me to

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-12 Thread Robert Haas
On Fri, Sep 12, 2014 at 5:28 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 09/12/2014 12:46 AM, Peter Geoghegan wrote: On Thu, Sep 11, 2014 at 1:50 PM, Robert Haas robertmh...@gmail.com wrote: I think I said pretty clearly that it was. I agree that you did, but it wasn't clear

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-12 Thread Peter Geoghegan
On Fri, Sep 12, 2014 at 11:38 AM, Robert Haas robertmh...@gmail.com wrote: Based on discussion thus far it seems that there's a possibility that the trade-off may be different for short strings vs. long strings. If the string is small enough to fit in the L1 CPU cache, then it may be that

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-12 Thread Robert Haas
On Fri, Sep 12, 2014 at 2:58 PM, Peter Geoghegan p...@heroku.com wrote: On Fri, Sep 12, 2014 at 11:38 AM, Robert Haas robertmh...@gmail.com wrote: Based on discussion thus far it seems that there's a possibility that the trade-off may be different for short strings vs. long strings. If the

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-12 Thread Peter Geoghegan
On Fri, Sep 12, 2014 at 12:02 PM, Robert Haas robertmh...@gmail.com wrote: I think I've said a few times now that I really want to get this additional data before forming an opinion. As a certain Mr. Doyle writes, It is a capital mistake to theorize before one has data. Insensibly one begins

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-11 Thread Peter Geoghegan
On Wed, Sep 10, 2014 at 11:36 AM, Robert Haas robertmh...@gmail.com wrote: No, not really. All you have to do is right a little test program to gather the information. I don't think a little test program is useful - IMV it's too much of a simplification to suppose that a strcoll() has a fixed

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-11 Thread Robert Haas
On Thu, Sep 11, 2014 at 4:13 PM, Peter Geoghegan p...@heroku.com wrote: On Wed, Sep 10, 2014 at 11:36 AM, Robert Haas robertmh...@gmail.com wrote: No, not really. All you have to do is right a little test program to gather the information. I don't think a little test program is useful - IMV

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-11 Thread Peter Geoghegan
On Thu, Sep 11, 2014 at 1:50 PM, Robert Haas robertmh...@gmail.com wrote: I think I said pretty clearly that it was. I agree that you did, but it wasn't clear exactly what factors you were asking me to simulate. It still isn't. Do you want me to compare the same string a million times in a loop,

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-11 Thread Peter Geoghegan
On Tue, Sep 9, 2014 at 2:25 PM, Robert Haas robertmh...@gmail.com wrote: I like that I don't have to care about every combination, and can treat abbreviation abortion as the special case with the extra step, in line with how I think of the optimization conceptually. Does that make sense? No.

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-10 Thread Peter Geoghegan
On Tue, Sep 9, 2014 at 2:00 PM, Robert Haas robertmh...@gmail.com wrote: Boiled down, what you're saying is that you might have a set that contains lots of duplicates in general, but not very many where the abbreviated-keys also match. Sure, that's true. Abbreviated keys are not used in the

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-10 Thread Robert Haas
On Wed, Sep 10, 2014 at 1:36 PM, Peter Geoghegan p...@heroku.com wrote: In order to know how much we're giving up in that case, we need the exact number I asked you to provide in my previous email: the ratio of the cost of strcoll() to the cost of memcmp(). I see that you haven't chosen to

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-09 Thread Robert Haas
On Thu, Sep 4, 2014 at 5:46 PM, Peter Geoghegan p...@heroku.com wrote: On Thu, Sep 4, 2014 at 2:18 PM, Robert Haas robertmh...@gmail.com wrote: Eh, maybe? I'm not sure why the case where we're using abbreviated keys should be different than the case we're not. In either case this is a

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-09 Thread Robert Haas
On Fri, Sep 5, 2014 at 10:45 PM, Peter Geoghegan p...@heroku.com wrote: While I gave serious consideration to your idea of having a dedicated abbreviation comparator, and not duplicating sortsupport state when abbreviated keys are used (going so far as to almost fully implement the idea), I

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-06 Thread Peter Geoghegan
On Fri, Sep 5, 2014 at 7:45 PM, Peter Geoghegan p...@heroku.com wrote: Attached additional patches are intended to be applied on top off most of the patches posted on September 2nd [1]. I attach another amendment/delta patch, intended to be applied on top of what was posted yesterday. I

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-06 Thread Peter Geoghegan
On Sat, Sep 6, 2014 at 3:01 PM, Peter Geoghegan p...@heroku.com wrote: I attach another amendment/delta patch Attached is another amendment to the patch set. With the recent addition of abbreviation support on 32-bit platforms, we should just hash the Datum representation as a uint32 on

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-05 Thread Peter Geoghegan
On Wed, Sep 3, 2014 at 2:44 PM, Peter Geoghegan p...@heroku.com wrote: I guess it should still be a configure option, then. Or maybe there should just be a USE_ABBREV_KEYS macro within pg_config_manual.h. Attached additional patches are intended to be applied on top off most of the patches

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-04 Thread Robert Haas
On Tue, Sep 2, 2014 at 10:27 PM, Peter Geoghegan p...@heroku.com wrote: * Still doesn't address the open question of whether or not we should optimistically always try memcmp() == 0 on tiebreak. I still lean towards yes. Let m be the cost of a memcmp() that fails near the end of the strings;

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-04 Thread Robert Haas
On Wed, Sep 3, 2014 at 5:44 PM, Peter Geoghegan p...@heroku.com wrote: On Wed, Sep 3, 2014 at 2:18 PM, Robert Haas robertmh...@gmail.com wrote: My suggestion is to remove the special cases for Darwin and 32-bit systems and see how it goes. I guess it should still be a configure option, then.

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-04 Thread Peter Geoghegan
On Thu, Sep 4, 2014 at 9:19 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Sep 2, 2014 at 10:27 PM, Peter Geoghegan p...@heroku.com wrote: * Still doesn't address the open question of whether or not we should optimistically always try memcmp() == 0 on tiebreak. I still lean towards yes.

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-04 Thread Robert Haas
On Thu, Sep 4, 2014 at 2:12 PM, Peter Geoghegan p...@heroku.com wrote: On Thu, Sep 4, 2014 at 9:19 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Sep 2, 2014 at 10:27 PM, Peter Geoghegan p...@heroku.com wrote: * Still doesn't address the open question of whether or not we should

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-04 Thread Peter Geoghegan
On Thu, Sep 4, 2014 at 2:18 PM, Robert Haas robertmh...@gmail.com wrote: Eh, maybe? I'm not sure why the case where we're using abbreviated keys should be different than the case we're not. In either case this is a straightforward trade-off: if we do a memcmp() before strcoll(), we win if it

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-04 Thread Peter Geoghegan
On Thu, Sep 4, 2014 at 11:12 AM, Peter Geoghegan p...@heroku.com wrote: What I consider an open question is whether or not we should do that on the first call when there is no abbreviated comparison, such as on the second or subsequent attribute in a multi-column sort, in the hope that

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-04 Thread Peter Geoghegan
On Thu, Sep 4, 2014 at 5:07 PM, Peter Geoghegan p...@heroku.com wrote: So I came up with what I imagined to be an unsympathetic case: BTW, this cities data is still available from: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/data/cities.dump -- Peter Geoghegan -- Sent via

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-03 Thread Robert Haas
On Tue, Sep 2, 2014 at 4:41 PM, Peter Geoghegan p...@heroku.com wrote: HyperLogLog isn't sample-based - it's useful for streaming a set and accurately tracking its cardinality with fixed overhead. OK. Is it the right decision to suppress the abbreviated-key optimization unconditionally on

  1   2   3   >