On Wed, Jan 21, 2015 at 2:22 AM, Peter Geoghegan wrote:
> You'll probably prefer the attached. This patch works by disabling
> abbreviation, but only after writing out runs, with the final merge
> left to go. That way, it doesn't matter when abbreviated keys are not
> read back from disk (or regen
On Fri, Jan 23, 2015 at 2:18 AM, David Rowley wrote:
> On 20 January 2015 at 17:10, Peter Geoghegan wrote:
>>
>> On Mon, Jan 19, 2015 at 7:47 PM, Michael Paquier
>> wrote:
>>
>> > With your patch applied, the failure with MSVC disappeared, but there
>> > is still a warning showing up:
>> > (ClCo
On 20 January 2015 at 17:10, Peter Geoghegan wrote:
> On Mon, Jan 19, 2015 at 7:47 PM, Michael Paquier
> wrote:
>
> > With your patch applied, the failure with MSVC disappeared, but there
> > is still a warning showing up:
> > (ClCompile target) ->
> > src\backend\lib\hyperloglog.c(73): warnin
> "Peter" == Peter Geoghegan writes:
Peter> Okay, then. I concede the point: We should support the datum
Peter> case as you outline, since it is simpler than any
Peter> alternative. It probably won't even be necessary to formalize
Peter> the idea that finished abbreviated keys must be pas
On Wed, Jan 21, 2015 at 2:11 PM, Peter Geoghegan wrote:
> Okay, then. I concede the point: We should support the datum case as
> you outline, since it is simpler than any alternative. It probably
> won't even be necessary to formalize the idea that finished
> abbreviated keys must be pass-by-value
On Wed, Jan 21, 2015 at 4:44 AM, Andrew Gierth
wrote:
> Now, I follow this general principle that someone who is not doing the
> work should never say "X is easy" to someone who _is_ doing it, unless
> they're prepared to at least outline the solution on request or
> otherwise contribute. So see
> "Peter" == Peter Geoghegan writes:
Peter> Basically, the intersection of the datum sort case with
Peter> abbreviated keys seems complicated.
Not to me. To me it seems completely trivial.
Now, I follow this general principle that someone who is not doing the
work should never say "X is e
> "Peter" == Peter Geoghegan writes:
Peter> You'll probably prefer the attached. This patch works by
Peter> disabling abbreviation, but only after writing out runs, with
Peter> the final merge left to go. That way, it doesn't matter when
Peter> abbreviated keys are not read back from disk
On Tue, Jan 20, 2015 at 6:39 PM, Peter Geoghegan wrote:
> On Tue, Jan 20, 2015 at 6:34 PM, Robert Haas wrote:
>> That might be OK. Probably needs a bit of performance testing to see
>> how it looks.
>
> Well, we're still only doing it when we do our final merge. So that's
> "only" doubling the n
On Tue, Jan 20, 2015 at 6:34 PM, Robert Haas wrote:
> That might be OK. Probably needs a bit of performance testing to see
> how it looks.
Well, we're still only doing it when we do our final merge. So that's
"only" doubling the number of conversions required, which if we're
blocked on I/O might
On Tue, Jan 20, 2015 at 9:33 PM, Peter Geoghegan wrote:
> On Tue, Jan 20, 2015 at 6:30 PM, Robert Haas wrote:
>> I don't want to change the on-disk format for tapes without a lot more
>> discussion. Can you come up with a fix that avoids that for now?
>
> A more conservative approach would be to
On Tue, Jan 20, 2015 at 6:30 PM, Robert Haas wrote:
> I don't want to change the on-disk format for tapes without a lot more
> discussion. Can you come up with a fix that avoids that for now?
A more conservative approach would be to perform conversion on-the-fly
once more. That wouldn't be paten
On Tue, Jan 20, 2015 at 8:39 PM, Peter Geoghegan wrote:
> On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas wrote:
>> I was assuming we were going to fix this by undoing the abbreviation
>> (as in the abort case) when we spill to disk, and not bothering with
>> it thereafter.
>
> The spill-to-disk cas
On Tue, Jan 20, 2015 at 5:46 PM, Peter Geoghegan wrote:
> Would you prefer it if the spill-to-disk case
> aborted in the style of low entropy keys? That doesn't seem
> significantly safer than this, and it certainly not acceptable from a
> performance perspective.
BTW, I can write that patch if t
On Tue, Jan 20, 2015 at 5:42 PM, Robert Haas wrote:
> On Tue, Jan 20, 2015 at 8:39 PM, Peter Geoghegan wrote:
>> On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas wrote:
>>> I was assuming we were going to fix this by undoing the abbreviation
>>> (as in the abort case) when we spill to disk, and not
On Tue, Jan 20, 2015 at 8:39 PM, Peter Geoghegan wrote:
> On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas wrote:
>> I was assuming we were going to fix this by undoing the abbreviation
>> (as in the abort case) when we spill to disk, and not bothering with
>> it thereafter.
>
> The spill-to-disk cas
On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas wrote:
> I was assuming we were going to fix this by undoing the abbreviation
> (as in the abort case) when we spill to disk, and not bothering with
> it thereafter.
The spill-to-disk case is at least as compelling at the internal sort
case. The overhe
On Tue, Jan 20, 2015 at 7:07 PM, Peter Geoghegan wrote:
> On Tue, Jan 20, 2015 at 3:57 PM, Peter Geoghegan wrote:
>> It's certainly possible to fix Andrew's test case with the attached.
>> I'm not sure that that's the appropriate fix, though: there is
>> probably a case to be made for not botheri
On Tue, Jan 20, 2015 at 3:57 PM, Peter Geoghegan wrote:
> It's certainly possible to fix Andrew's test case with the attached.
> I'm not sure that that's the appropriate fix, though: there is
> probably a case to be made for not bothering with abbreviation once
> we've read tuples in for the final
On Tue, Jan 20, 2015 at 3:34 PM, Peter Geoghegan wrote:
> On Tue, Jan 20, 2015 at 3:34 PM, Robert Haas wrote:
>> Dear me. Peter, can you fix this RSN?
>
> Investigating.
It's certainly possible to fix Andrew's test case with the attached.
I'm not sure that that's the appropriate fix, though: th
On Tue, Jan 20, 2015 at 3:34 PM, Robert Haas wrote:
> Dear me. Peter, can you fix this RSN?
Investigating.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jan 20, 2015 at 3:33 PM, Robert Haas wrote:
> Peter, this made bowerbird (Windows 8/Visual Studio) build, but it's
> failing make check. Ditto hamerkop (Windows 2k8/VC++) and currawong
> (Windows XP Pro/MSVC++). jacana (Windows 8/gcc) and brolga (Windows
> XP Pro/cygwin) are unhappy too,
On Tue, Jan 20, 2015 at 6:27 PM, Andrew Gierth
wrote:
>> "Robert" == Robert Haas writes:
> Robert> All right, it seems Tom is with you on that point, so after
> Robert> some study, I've committed this with very minor modifications.
>
> While hacking up a patch to demonstrate the simplicity
On Tue, Jan 20, 2015 at 10:54 AM, Robert Haas wrote:
> On Mon, Jan 19, 2015 at 9:29 PM, Peter Geoghegan wrote:
>> I think that the attached patch should at least fix that much. Maybe
>> the problem on the other animal is also explained by the lack of this,
>> since there could also be a MinGW-ish
> "Robert" == Robert Haas writes:
Robert> All right, it seems Tom is with you on that point, so after
Robert> some study, I've committed this with very minor modifications.
While hacking up a patch to demonstrate the simplicity of extending this
to the Datum sorter, I seem to have run into
On Tue, Jan 20, 2015 at 2:00 PM, Peter Geoghegan wrote:
> Maybe that's the
> wrong way of fixing that, but for now I don't think it's acceptable
> that abbreviation isn't always used in certain cases where it could
> make sense (e.g. not for simple GroupAggregates with a single
> attribute -- only
On Tue, Jan 20, 2015 at 3:46 AM, Andrew Gierth
wrote:
> The comment in tuplesort_begin_datum that abbreviation can't be used
> seems wrong to me; why is the copy of the original value pointed to by
> stup->tuple (in the case of by-reference types, and abbreviation is
> obviously not needed for by-
On Mon, Jan 19, 2015 at 9:29 PM, Peter Geoghegan wrote:
> I think that the attached patch should at least fix that much. Maybe
> the problem on the other animal is also explained by the lack of this,
> since there could also be a MinGW-ish strxfrm_l(), I suppose.
Committed that, rather blindly, s
> "Robert" == Robert Haas writes:
Robert> All right, it seems Tom is with you on that point, so after
Robert> some study, I've committed this with very minor modifications.
This caught my eye (thanks to conflict with GS patch):
* In the future, we should consider forcing the
* tuplesort
On Mon, Jan 19, 2015 at 7:47 PM, Michael Paquier
wrote:
> On MinGW-32, not that I know of:
> $ find . -name *.h | xgrep strxfrm_l
> ./lib/gcc/mingw32/4.8.1/include/c++/mingw32/bits/c++config.h:/* Define if
> strxfr
> m_l is available in . */
> ./mingw32/lib/gcc/mingw32/4.8.1/include/c++/mingw32/b
On Tue, Jan 20, 2015 at 11:29 AM, Peter Geoghegan wrote:
> On Mon, Jan 19, 2015 at 5:59 PM, Peter Geoghegan wrote:
>> On Mon, Jan 19, 2015 at 5:33 PM, Alvaro Herrera
>> wrote:
>>> You did notice that bowerbird isn't building, right?
>>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowe
On Mon, Jan 19, 2015 at 5:59 PM, Peter Geoghegan wrote:
> On Mon, Jan 19, 2015 at 5:33 PM, Alvaro Herrera
> wrote:
>> You did notice that bowerbird isn't building, right?
>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2015-01-19%2023%3A54%3A46
>
> Yeah. Looks like strxfrm_
On Mon, Jan 19, 2015 at 5:33 PM, Alvaro Herrera
wrote:
> You did notice that bowerbird isn't building, right?
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2015-01-19%2023%3A54%3A46
Yeah. Looks like strxfrm_l() isn't available on the animal, for whatever reason.
--
Peter
Peter Geoghegan wrote:
> It appears that the buildfarm animal brolga isn't happy about this
> patch. I'm not sure why, since I thought we already figured out bugs
> or other inconsistencies in various strxfrm() implementations.
You did notice that bowerbird isn't building, right?
http://buildfarm
On Mon, Jan 19, 2015 at 5:43 PM, Peter Geoghegan wrote:
> It appears that the buildfarm animal brolga isn't happy about this
> patch. I'm not sure why, since I thought we already figured out bugs
> or other inconsistencies in various strxfrm() implementations.
Well, the first thing that comes to
On Mon, Jan 19, 2015 at 12:33 PM, Robert Haas wrote:
> All right, it seems Tom is with you on that point, so after some
> study, I've committed this with very minor modifications. Sorry for
> the long delay.
Thank you very much for your help with this! I appreciate it.
> I have not committed th
* Robert Haas (robertmh...@gmail.com) wrote:
> On the PPC64 machine I normally use for performance testing, it takes
> about 6.3 seconds to build the index with the commit just before this
> one. With this commit, it drops to 1.9 seconds. That's more than a
> 3x speedup!
>
> Now, if I change the
On Mon, Jan 19, 2015 at 3:33 PM, Robert Haas wrote:
> All right, it seems Tom is with you on that point, so after some
> study, I've committed this with very minor modifications. Sorry for
> the long delay. I have not committed the 0002 patch, though, because
> I haven't studied that enough yet
On Tue, Dec 2, 2014 at 8:28 PM, Peter Geoghegan wrote:
> On Tue, Dec 2, 2014 at 2:16 PM, Peter Geoghegan wrote:
>> On Tue, Dec 2, 2014 at 2:07 PM, Robert Haas wrote:
>>> Well, maybe you should make the updates we've agreed on and I can take
>>> another look at it.
>>
>> Agreed.
>
> Attached, rev
On Wed, Dec 3, 2014 at 10:43 AM, Peter Geoghegan wrote:
> On Tue, Dec 2, 2014 at 5:28 PM, Peter Geoghegan wrote:
>> Attached, revised patchset makes these updates.
>
> Whoops. Missed some obsolete comments. Here is a third commit that
> makes a further small modification to one comment.
Moving th
There is an interesting thread about strcoll() overhead over on -general:
http://www.postgresql.org/message-id/cab25xexnondrmc1_cy3jvmb0tmydm38ef9q2d7xla0rbncj...@mail.gmail.com
My guess was that this person experienced a rather unexpected downside
of spilling to disk when sorting on a text attri
On Tue, Dec 2, 2014 at 1:21 PM, Peter Geoghegan wrote:
> Incidentally, I think that an under-appreciated possible source of
> regressions here is that attributes abbreviated have a strong
> physical/logical correlation. I could see a small regression for one
> such case even though my cost model i
On Tue, Dec 2, 2014 at 5:44 PM, Tom Lane wrote:
> Peter Geoghegan writes:
>> On Tue, Dec 2, 2014 at 2:21 PM, Robert Haas wrote:
>>> Right, and what I'm saying is that maybe the "applicability" flag
>>> shouldn't be stored in the SortSupport object, but passed down as an
>>> argument.
>
>> But th
On Tue, Dec 2, 2014 at 5:28 PM, Peter Geoghegan wrote:
> Attached, revised patchset makes these updates.
Whoops. Missed some obsolete comments. Here is a third commit that
makes a further small modification to one comment.
--
Peter Geoghegan
From 8d1aba80f95e05742047cba5bd83d8f17aa5ef37 Mon Sep
On Tue, Dec 2, 2014 at 2:16 PM, Peter Geoghegan wrote:
> On Tue, Dec 2, 2014 at 2:07 PM, Robert Haas wrote:
>> Well, maybe you should make the updates we've agreed on and I can take
>> another look at it.
>
> Agreed.
Attached, revised patchset makes these updates. I continue to use the
sortsuppo
Peter Geoghegan writes:
> On Tue, Dec 2, 2014 at 2:21 PM, Robert Haas wrote:
>> Right, and what I'm saying is that maybe the "applicability" flag
>> shouldn't be stored in the SortSupport object, but passed down as an
>> argument.
> But then how does that information get to any given sortsupport
On Tue, Dec 2, 2014 at 2:21 PM, Robert Haas wrote:
> Right, and what I'm saying is that maybe the "applicability" flag
> shouldn't be stored in the SortSupport object, but passed down as an
> argument.
But then how does that information get to any given sortsupport
routine? That's the place that
On Tue, Dec 2, 2014 at 5:16 PM, Peter Geoghegan wrote:
> On Tue, Dec 2, 2014 at 2:07 PM, Robert Haas wrote:
>> Well, maybe you should make the updates we've agreed on and I can take
>> another look at it.
>
> Agreed.
>
>> But I didn't think that I was proposing to change
>> anything about the lev
On Tue, Dec 2, 2014 at 2:07 PM, Robert Haas wrote:
> Well, maybe you should make the updates we've agreed on and I can take
> another look at it.
Agreed.
> But I didn't think that I was proposing to change
> anything about the level at which the decision about whether to
> abbreviate or not was
On Tue, Dec 2, 2014 at 4:21 PM, Peter Geoghegan wrote:
>>> I'm not sure about that. I'd prefer to have tuplesort (and one or two
>>> other sites) set the "abbreviation is possible in principle" flag.
>>> Otherwise, sortsupport needs to assume that the leading attribute is
>>> going to be the abbre
On Tue, Dec 2, 2014 at 1:00 PM, Robert Haas wrote:
> I'd prefer not to have a #define in pg_config_manual.h. Only stuff
> that we expect a reasonably decent number of users to need to change
> should be in that file, and this is too marginal for that. If anybody
> other than the developers of th
On Tue, Nov 25, 2014 at 1:38 PM, Peter Geoghegan wrote:
> On Tue, Nov 25, 2014 at 4:01 AM, Robert Haas wrote:
>> - This appears to needlessly reindent the comments for PG_CACHE_LINE_SIZE.
>
> Actually, the word "only" is removed (because PG_CACHE_LINE_SIZE has a
> new client now). So it isn't qui
On Tue, Nov 25, 2014 at 4:01 AM, Robert Haas wrote:
> There's a lot of stuff in this patch I'm still trying to digest
I spotted a bug in the most recent revision. Mea culpa.
I think that the new field Tuplesortstate.abbrevNext should be an
int64, not an int. The fact that Tuplesortstate.memtupco
On Tue, Nov 25, 2014 at 10:38 AM, Peter Geoghegan wrote:
>> - Also, I don't think making abbrev_state an enumerated value with two
>> values is really doing anything for us; we could just use a Boolean.
>> I'm wondering if we should actually go a bit further and remove this
>> from the SortSupport
On Tue, Nov 25, 2014 at 4:01 AM, Robert Haas wrote:
> - This appears to needlessly reindent the comments for PG_CACHE_LINE_SIZE.
Actually, the word "only" is removed (because PG_CACHE_LINE_SIZE has a
new client now). So it isn't quite the same paragraph as before.
> - I really don't think we nee
On Sun, Nov 9, 2014 at 10:02 PM, Peter Geoghegan wrote:
> On Sat, Oct 11, 2014 at 6:34 PM, Peter Geoghegan wrote:
>> Attached patch, when applied, accelerates all tuplesort cases using
>> abbreviated keys, building on previous work here, as well as the patch
>> posted to that other thread.
>
> I
On Sat, Oct 11, 2014 at 6:34 PM, Peter Geoghegan wrote:
> Attached patch, when applied, accelerates all tuplesort cases using
> abbreviated keys, building on previous work here, as well as the patch
> posted to that other thread.
I attach an updated patch set, rebased on top of the master branch'
On Mon, Sep 29, 2014 at 10:34 PM, Peter Geoghegan wrote:
> .
You probably noticed that I posted an independently useful patch to
make all tuplesort cases use sortsupport [1] - currently, both the
B-Tree and CLUSTER cases do not use the sortsupport infrastructure
more or less for no good reason. T
On Thu, Sep 25, 2014 at 1:36 PM, Robert Haas wrote:
> (concerns about a second sortsupport state)
I think I may have underestimated the cost of not have
sorttuple.datum1 with a pointer-to-text representation available in
cases such as the one you describe.
Attached revision introduces an alterna
On Thu, Sep 25, 2014 at 3:17 PM, Peter Geoghegan wrote:
>> To find out how much that optimization buys, you
>> should use tuples with many variable-length columns (say, 50)
>> preceding the text column you're sorting on. I won't be surprised if
>> that turns out to be expensive enough to be worth
On Thu, Sep 25, 2014 at 11:53 AM, Robert Haas wrote:
> I haven't looked at that part of the patch in detail yet, so... not
> really. But I don't see why you'd ever need to restart heap tuple
> copying. At most you'd need to re-extract datum1 from the tuples you
> have already copied.
Well, okay
On Thu, Sep 25, 2014 at 2:05 PM, Peter Geoghegan wrote:
> On Thu, Sep 25, 2014 at 9:21 AM, Robert Haas wrote:
>> The top issue on my agenda is figuring out a way to get rid of the
>> extra SortSupport object.
>
> Really? I'm surprised. Clearly the need to restart heap tuple copying
> from scratch
On Thu, Sep 25, 2014 at 9:21 AM, Robert Haas wrote:
> The top issue on my agenda is figuring out a way to get rid of the
> extra SortSupport object.
Really? I'm surprised. Clearly the need to restart heap tuple copying
from scratch, in order to make the datum1 representation consistent,
rather th
On Wed, Sep 24, 2014 at 7:04 PM, Peter Geoghegan wrote:
> On Fri, Sep 19, 2014 at 2:54 PM, Peter Geoghegan wrote:
>> Probably not - it appears to make very little difference to
>> unoptimized pass-by-reference types whether or not datum1 can be used
>> (see my simulation of Kevin's worst case, fo
On Fri, Sep 19, 2014 at 2:54 PM, Peter Geoghegan wrote:
> Probably not - it appears to make very little difference to
> unoptimized pass-by-reference types whether or not datum1 can be used
> (see my simulation of Kevin's worst case, for example [1]). Streaming
> through a not inconsiderable propo
On Fri, Sep 19, 2014 at 2:35 PM, Robert Haas wrote:
> Also, shouldn't you go back and fix up
> those abbreviated keys to point to datum1 again if you abort?
Probably not - it appears to make very little difference to
unoptimized pass-by-reference types whether or not datum1 can be used
(see my si
On Thu, Sep 11, 2014 at 8:34 PM, Peter Geoghegan wrote:
> On Tue, Sep 9, 2014 at 2:25 PM, Robert Haas wrote:
>>> I like that I don't have to care about every combination, and can
>>> treat abbreviation abortion as the special case with the extra step,
>>> in line with how I think of the optimizat
On Fri, Sep 19, 2014 at 9:59 AM, Robert Haas wrote:
> OK, good point. So committed as-is, then, except that I rewrote the
> comments, which I felt were excessively long for the amount of code.
Thanks!
I look forward to hearing your thoughts on the open issues with the
patch as a whole.
--
Pete
On Tue, Sep 16, 2014 at 4:55 PM, Peter Geoghegan wrote:
> On Tue, Sep 16, 2014 at 1:45 PM, Robert Haas wrote:
>> Even though our testing seems to indicate that the memcmp() is
>> basically free, I think it would be good to make the effort to avoid
>> doing memcmp() and then strcoll() and then str
On Tue, Sep 16, 2014 at 1:45 PM, Robert Haas wrote:
> Even though our testing seems to indicate that the memcmp() is
> basically free, I think it would be good to make the effort to avoid
> doing memcmp() and then strcoll() and then strncmp(). Seems like it
> shouldn't be too hard.
Really? The t
On Mon, Sep 15, 2014 at 7:21 PM, Peter Geoghegan wrote:
> On Mon, Sep 15, 2014 at 11:25 AM, Peter Geoghegan wrote:
>> OK, I'll draft a patch for that today, including similar alterations
>> to varstr_cmp() for the benefit of Windows and so on.
>
> I attach a much simpler patch, that only adds an
On Mon, Sep 15, 2014 at 4:21 PM, Peter Geoghegan wrote:
> I attach a much simpler patch, that only adds an opportunistic
> "memcmp() == 0" before a possible strcoll(). Both
> bttextfastcmp_locale() and varstr_cmp() have the optimization added,
> since there is no point in leaving anyone out for t
On Mon, Sep 15, 2014 at 11:25 AM, Peter Geoghegan wrote:
> OK, I'll draft a patch for that today, including similar alterations
> to varstr_cmp() for the benefit of Windows and so on.
I attach a much simpler patch, that only adds an opportunistic
"memcmp() == 0" before a possible strcoll(). Both
On Mon, Sep 15, 2014 at 11:20 AM, Robert Haas wrote:
> ...looks like about a 10-line patch. We have the data to show that
> the loss is trivial even in the worst case, and we have or should be
> able to get data showing that the best-case win is significant even
> without the abbreviated key stuf
On Mon, Sep 15, 2014 at 1:55 PM, Peter Geoghegan wrote:
> On Mon, Sep 15, 2014 at 10:53 AM, Robert Haas wrote:
>> I think there's probably more than that to work out, but in any case
>> there's no harm in getting a simple optimization done first before
>> moving on to a complicated one.
>
> I gue
On Mon, Sep 15, 2014 at 10:53 AM, Robert Haas wrote:
> I think there's probably more than that to work out, but in any case
> there's no harm in getting a simple optimization done first before
> moving on to a complicated one.
I guess we never talked about the abort logic in all that much detail.
On Mon, Sep 15, 2014 at 1:34 PM, Peter Geoghegan wrote:
> On Mon, Sep 15, 2014 at 10:17 AM, Robert Haas wrote:
>> It strikes me that perhaps we should make this change (rearranging
>> things so that the memcmp tiebreak is run before strcoll) first,
>> before dealing with the rest of the abbreviat
On Mon, Sep 15, 2014 at 10:17 AM, Robert Haas wrote:
> It strikes me that perhaps we should make this change (rearranging
> things so that the memcmp tiebreak is run before strcoll) first,
> before dealing with the rest of the abbreviated keys infrastructure.
> It appears to be a separate improvem
On Sun, Sep 14, 2014 at 10:37 AM, Heikki Linnakangas
wrote:
> On 09/13/2014 11:28 PM, Peter Geoghegan wrote:
>> Anyway, attached rough test program implements what you outline. This
>> is for 30,000 32 byte strings (where just the final two bytes differ).
>> On my laptop, output looks like this (e
On 09/14/2014 11:34 PM, Peter Geoghegan wrote:
On Sun, Sep 14, 2014 at 7:37 AM, Heikki Linnakangas
wrote:
Both values vary in range 5.9 - 6.1 s, so it's fair to say that the useless
memcmp() is free with these parameters.
Is this the worst case scenario?
Other than pushing the differences mu
On Sun, Sep 14, 2014 at 7:37 AM, Heikki Linnakangas
wrote:
> Got to be careful to not let the compiler optimize away microbenchmarks like
> this. At least with my version of gcc, the strcoll calls get optimized away,
> as do the memcmp calls, if you don't use the result for anything. Clang was
> e
On 09/13/2014 11:28 PM, Peter Geoghegan wrote:
Anyway, attached rough test program implements what you outline. This
is for 30,000 32 byte strings (where just the final two bytes differ).
On my laptop, output looks like this (edited to only show median
duration in each case):
Got to be careful
On Fri, Sep 12, 2014 at 11:38 AM, Robert Haas wrote:
> Based on discussion thus far it seems that there's a possibility that
> the trade-off may be different for short strings vs. long strings. If
> the string is small enough to fit in the L1 CPU cache, then it may be
> that memcmp() followed by
On Fri, Sep 12, 2014 at 12:02 PM, Robert Haas wrote:
> I think I've said a few times now that I really want to get this
> additional data before forming an opinion. As a certain Mr. Doyle
> writes, "It is a capital mistake to theorize before one has data.
> Insensibly one begins to twist facts to
On Fri, Sep 12, 2014 at 2:58 PM, Peter Geoghegan wrote:
> On Fri, Sep 12, 2014 at 11:38 AM, Robert Haas wrote:
>> Based on discussion thus far it seems that there's a possibility that
>> the trade-off may be different for short strings vs. long strings. If
>> the string is small enough to fit in
On Fri, Sep 12, 2014 at 11:38 AM, Robert Haas wrote:
> Based on discussion thus far it seems that there's a possibility that
> the trade-off may be different for short strings vs. long strings. If
> the string is small enough to fit in the L1 CPU cache, then it may be
> that memcmp() followed by
On Fri, Sep 12, 2014 at 5:28 AM, Heikki Linnakangas
wrote:
> On 09/12/2014 12:46 AM, Peter Geoghegan wrote:
>>
>> On Thu, Sep 11, 2014 at 1:50 PM, Robert Haas
>> wrote:
>>>
>>> I think I said pretty clearly that it was.
>>
>>
>> I agree that you did, but it wasn't clear exactly what factors you
>
On 09/12/2014 12:46 AM, Peter Geoghegan wrote:
On Thu, Sep 11, 2014 at 1:50 PM, Robert Haas wrote:
I think I said pretty clearly that it was.
I agree that you did, but it wasn't clear exactly what factors you
were asking me to simulate.
All factors.
Do you want me to compare the same stri
On Tue, Sep 9, 2014 at 2:25 PM, Robert Haas wrote:
>> I like that I don't have to care about every combination, and can
>> treat abbreviation abortion as the special case with the extra step,
>> in line with how I think of the optimization conceptually. Does that
>> make sense?
>
> No. comparetup
On Thu, Sep 11, 2014 at 1:50 PM, Robert Haas wrote:
> I think I said pretty clearly that it was.
I agree that you did, but it wasn't clear exactly what factors you
were asking me to simulate. It still isn't. Do you want me to compare
the same string a million times in a loop, both with a strcoll(
On Thu, Sep 11, 2014 at 4:13 PM, Peter Geoghegan wrote:
> On Wed, Sep 10, 2014 at 11:36 AM, Robert Haas wrote:
>> No, not really. All you have to do is right a little test program to
>> gather the information.
>
> I don't think a little test program is useful - IMV it's too much of a
> simplific
On Wed, Sep 10, 2014 at 11:36 AM, Robert Haas wrote:
> No, not really. All you have to do is right a little test program to
> gather the information.
I don't think a little test program is useful - IMV it's too much of a
simplification to suppose that a strcoll() has a fixed cost, and a
memcmp()
On Wed, Sep 10, 2014 at 1:36 PM, Peter Geoghegan wrote:
>> In order to know how much we're
>> giving up in that case, we need the exact number I asked you to
>> provide in my previous email: the ratio of the cost of strcoll() to
>> the cost of memcmp().
>>
>> I see that you haven't chosen to provi
On Tue, Sep 9, 2014 at 2:00 PM, Robert Haas wrote:
> Boiled down, what you're saying is that you might have a set that
> contains lots of duplicates in general, but not very many where the
> abbreviated-keys also match. Sure, that's true.
Abbreviated keys are not used in the case where we do a (
On Fri, Sep 5, 2014 at 10:45 PM, Peter Geoghegan wrote:
> While I gave serious consideration to your idea of having a dedicated
> abbreviation comparator, and not duplicating sortsupport state when
> abbreviated keys are used (going so far as to almost fully implement
> the idea), I ultimately dec
On Thu, Sep 4, 2014 at 5:46 PM, Peter Geoghegan wrote:
> On Thu, Sep 4, 2014 at 2:18 PM, Robert Haas wrote:
>> Eh, maybe? I'm not sure why the case where we're using abbreviated
>> keys should be different than the case we're not. In either case this
>> is a straightforward trade-off: if we do
On Sat, Sep 6, 2014 at 3:01 PM, Peter Geoghegan wrote:
> I attach another amendment/delta patch
Attached is another amendment to the patch set. With the recent
addition of abbreviation support on 32-bit platforms, we should just
hash the Datum representation as a uint32 on SIZEOF_DATUM != 8
platf
On Fri, Sep 5, 2014 at 7:45 PM, Peter Geoghegan wrote:
> Attached additional patches are intended to be applied on top off most
> of the patches posted on September 2nd [1].
I attach another amendment/delta patch, intended to be applied on top
of what was posted yesterday. I neglected to remove
On Wed, Sep 3, 2014 at 2:44 PM, Peter Geoghegan wrote:
> I guess it should still be a configure option, then. Or maybe there
> should just be a USE_ABBREV_KEYS macro within pg_config_manual.h.
Attached additional patches are intended to be applied on top off most
of the patches posted on Septembe
On Thu, Sep 4, 2014 at 5:07 PM, Peter Geoghegan wrote:
> So I came up with what I imagined to be an unsympathetic case:
BTW, this "cities" data is still available from:
http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/data/cities.dump
--
Peter Geoghegan
--
Sent via pgsql-hacker
1 - 100 of 242 matches
Mail list logo