On Wed, Jan 21, 2015 at 2:22 AM, Peter Geoghegan p...@heroku.com wrote:
You'll probably prefer the attached. This patch works by disabling
abbreviation, but only after writing out runs, with the final merge
left to go. That way, it doesn't matter when abbreviated keys are not
read back from
On Fri, Jan 23, 2015 at 2:18 AM, David Rowley dgrowle...@gmail.com wrote:
On 20 January 2015 at 17:10, Peter Geoghegan p...@heroku.com wrote:
On Mon, Jan 19, 2015 at 7:47 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
With your patch applied, the failure with MSVC disappeared, but
On 20 January 2015 at 17:10, Peter Geoghegan p...@heroku.com wrote:
On Mon, Jan 19, 2015 at 7:47 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
With your patch applied, the failure with MSVC disappeared, but there
is still a warning showing up:
(ClCompile target) -
Peter == Peter Geoghegan p...@heroku.com writes:
Peter You'll probably prefer the attached. This patch works by
Peter disabling abbreviation, but only after writing out runs, with
Peter the final merge left to go. That way, it doesn't matter when
Peter abbreviated keys are not read back from
Peter == Peter Geoghegan p...@heroku.com writes:
Peter Basically, the intersection of the datum sort case with
Peter abbreviated keys seems complicated.
Not to me. To me it seems completely trivial.
Now, I follow this general principle that someone who is not doing the
work should never say
Robert == Robert Haas robertmh...@gmail.com writes:
Robert All right, it seems Tom is with you on that point, so after
Robert some study, I've committed this with very minor modifications.
This caught my eye (thanks to conflict with GS patch):
* In the future, we should consider forcing the
On Mon, Jan 19, 2015 at 9:29 PM, Peter Geoghegan p...@heroku.com wrote:
I think that the attached patch should at least fix that much. Maybe
the problem on the other animal is also explained by the lack of this,
since there could also be a MinGW-ish strxfrm_l(), I suppose.
Committed that,
On Tue, Jan 20, 2015 at 3:34 PM, Peter Geoghegan p...@heroku.com wrote:
On Tue, Jan 20, 2015 at 3:34 PM, Robert Haas robertmh...@gmail.com wrote:
Dear me. Peter, can you fix this RSN?
Investigating.
It's certainly possible to fix Andrew's test case with the attached.
I'm not sure that that's
On Tue, Jan 20, 2015 at 6:27 PM, Andrew Gierth
and...@tao11.riddles.org.uk wrote:
Robert == Robert Haas robertmh...@gmail.com writes:
Robert All right, it seems Tom is with you on that point, so after
Robert some study, I've committed this with very minor modifications.
While hacking up a
On Tue, Jan 20, 2015 at 3:46 AM, Andrew Gierth
and...@tao11.riddles.org.uk wrote:
The comment in tuplesort_begin_datum that abbreviation can't be used
seems wrong to me; why is the copy of the original value pointed to by
stup-tuple (in the case of by-reference types, and abbreviation is
Robert == Robert Haas robertmh...@gmail.com writes:
Robert All right, it seems Tom is with you on that point, so after
Robert some study, I've committed this with very minor modifications.
While hacking up a patch to demonstrate the simplicity of extending this
to the Datum sorter, I seem to
On Tue, Jan 20, 2015 at 2:00 PM, Peter Geoghegan p...@heroku.com wrote:
Maybe that's the
wrong way of fixing that, but for now I don't think it's acceptable
that abbreviation isn't always used in certain cases where it could
make sense (e.g. not for simple GroupAggregates with a single
On Tue, Jan 20, 2015 at 3:34 PM, Robert Haas robertmh...@gmail.com wrote:
Dear me. Peter, can you fix this RSN?
Investigating.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
On Tue, Jan 20, 2015 at 3:33 PM, Robert Haas robertmh...@gmail.com wrote:
Peter, this made bowerbird (Windows 8/Visual Studio) build, but it's
failing make check. Ditto hamerkop (Windows 2k8/VC++) and currawong
(Windows XP Pro/MSVC++). jacana (Windows 8/gcc) and brolga (Windows
XP
On Tue, Jan 20, 2015 at 3:57 PM, Peter Geoghegan p...@heroku.com wrote:
It's certainly possible to fix Andrew's test case with the attached.
I'm not sure that that's the appropriate fix, though: there is
probably a case to be made for not bothering with abbreviation once
we've read tuples in
On Tue, Jan 20, 2015 at 10:54 AM, Robert Haas robertmh...@gmail.com wrote:
On Mon, Jan 19, 2015 at 9:29 PM, Peter Geoghegan p...@heroku.com wrote:
I think that the attached patch should at least fix that much. Maybe
the problem on the other animal is also explained by the lack of this,
since
On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas robertmh...@gmail.com wrote:
I was assuming we were going to fix this by undoing the abbreviation
(as in the abort case) when we spill to disk, and not bothering with
it thereafter.
The spill-to-disk case is at least as compelling at the internal
On Tue, Jan 20, 2015 at 9:33 PM, Peter Geoghegan p...@heroku.com wrote:
On Tue, Jan 20, 2015 at 6:30 PM, Robert Haas robertmh...@gmail.com wrote:
I don't want to change the on-disk format for tapes without a lot more
discussion. Can you come up with a fix that avoids that for now?
A more
On Tue, Jan 20, 2015 at 8:39 PM, Peter Geoghegan p...@heroku.com wrote:
On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas robertmh...@gmail.com wrote:
I was assuming we were going to fix this by undoing the abbreviation
(as in the abort case) when we spill to disk, and not bothering with
it
On Tue, Jan 20, 2015 at 6:30 PM, Robert Haas robertmh...@gmail.com wrote:
I don't want to change the on-disk format for tapes without a lot more
discussion. Can you come up with a fix that avoids that for now?
A more conservative approach would be to perform conversion on-the-fly
once more.
On Tue, Jan 20, 2015 at 7:07 PM, Peter Geoghegan p...@heroku.com wrote:
On Tue, Jan 20, 2015 at 3:57 PM, Peter Geoghegan p...@heroku.com wrote:
It's certainly possible to fix Andrew's test case with the attached.
I'm not sure that that's the appropriate fix, though: there is
probably a case to
On Tue, Jan 20, 2015 at 6:39 PM, Peter Geoghegan p...@heroku.com wrote:
On Tue, Jan 20, 2015 at 6:34 PM, Robert Haas robertmh...@gmail.com wrote:
That might be OK. Probably needs a bit of performance testing to see
how it looks.
Well, we're still only doing it when we do our final merge. So
On Tue, Jan 20, 2015 at 5:42 PM, Robert Haas robertmh...@gmail.com wrote:
On Tue, Jan 20, 2015 at 8:39 PM, Peter Geoghegan p...@heroku.com wrote:
On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas robertmh...@gmail.com wrote:
I was assuming we were going to fix this by undoing the abbreviation
(as
On Tue, Jan 20, 2015 at 5:46 PM, Peter Geoghegan p...@heroku.com wrote:
Would you prefer it if the spill-to-disk case
aborted in the style of low entropy keys? That doesn't seem
significantly safer than this, and it certainly not acceptable from a
performance perspective.
BTW, I can write
On Tue, Jan 20, 2015 at 8:39 PM, Peter Geoghegan p...@heroku.com wrote:
On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas robertmh...@gmail.com wrote:
I was assuming we were going to fix this by undoing the abbreviation
(as in the abort case) when we spill to disk, and not bothering with
it
On Tue, Jan 20, 2015 at 6:34 PM, Robert Haas robertmh...@gmail.com wrote:
That might be OK. Probably needs a bit of performance testing to see
how it looks.
Well, we're still only doing it when we do our final merge. So that's
only doubling the number of conversions required, which if we're
On Tue, Jan 20, 2015 at 11:29 AM, Peter Geoghegan p...@heroku.com wrote:
On Mon, Jan 19, 2015 at 5:59 PM, Peter Geoghegan p...@heroku.com wrote:
On Mon, Jan 19, 2015 at 5:33 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
You did notice that bowerbird isn't building, right?
Peter Geoghegan wrote:
It appears that the buildfarm animal brolga isn't happy about this
patch. I'm not sure why, since I thought we already figured out bugs
or other inconsistencies in various strxfrm() implementations.
You did notice that bowerbird isn't building, right?
On Mon, Jan 19, 2015 at 5:43 PM, Peter Geoghegan p...@heroku.com wrote:
It appears that the buildfarm animal brolga isn't happy about this
patch. I'm not sure why, since I thought we already figured out bugs
or other inconsistencies in various strxfrm() implementations.
Well, the first thing
On Mon, Jan 19, 2015 at 7:47 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
On MinGW-32, not that I know of:
$ find . -name *.h | xgrep strxfrm_l
./lib/gcc/mingw32/4.8.1/include/c++/mingw32/bits/c++config.h:/* Define if
strxfr
m_l is available in string.h. */
On Mon, Jan 19, 2015 at 5:59 PM, Peter Geoghegan p...@heroku.com wrote:
On Mon, Jan 19, 2015 at 5:33 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
You did notice that bowerbird isn't building, right?
On Mon, Jan 19, 2015 at 5:33 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
You did notice that bowerbird isn't building, right?
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbirddt=2015-01-19%2023%3A54%3A46
Yeah. Looks like strxfrm_l() isn't available on the animal, for
On Tue, Dec 2, 2014 at 8:28 PM, Peter Geoghegan p...@heroku.com wrote:
On Tue, Dec 2, 2014 at 2:16 PM, Peter Geoghegan p...@heroku.com wrote:
On Tue, Dec 2, 2014 at 2:07 PM, Robert Haas robertmh...@gmail.com wrote:
Well, maybe you should make the updates we've agreed on and I can take
another
On Mon, Jan 19, 2015 at 3:33 PM, Robert Haas robertmh...@gmail.com wrote:
All right, it seems Tom is with you on that point, so after some
study, I've committed this with very minor modifications. Sorry for
the long delay. I have not committed the 0002 patch, though, because
I haven't
* Robert Haas (robertmh...@gmail.com) wrote:
On the PPC64 machine I normally use for performance testing, it takes
about 6.3 seconds to build the index with the commit just before this
one. With this commit, it drops to 1.9 seconds. That's more than a
3x speedup!
Now, if I change the
There is an interesting thread about strcoll() overhead over on -general:
http://www.postgresql.org/message-id/cab25xexnondrmc1_cy3jvmb0tmydm38ef9q2d7xla0rbncj...@mail.gmail.com
My guess was that this person experienced a rather unexpected downside
of spilling to disk when sorting on a text
On Tue, Dec 2, 2014 at 5:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Peter Geoghegan p...@heroku.com writes:
On Tue, Dec 2, 2014 at 2:21 PM, Robert Haas robertmh...@gmail.com wrote:
Right, and what I'm saying is that maybe the applicability flag
shouldn't be stored in the SortSupport object, but
On Tue, Dec 2, 2014 at 1:21 PM, Peter Geoghegan p...@heroku.com wrote:
Incidentally, I think that an under-appreciated possible source of
regressions here is that attributes abbreviated have a strong
physical/logical correlation. I could see a small regression for one
such case even though my
On Tue, Nov 25, 2014 at 1:38 PM, Peter Geoghegan p...@heroku.com wrote:
On Tue, Nov 25, 2014 at 4:01 AM, Robert Haas robertmh...@gmail.com wrote:
- This appears to needlessly reindent the comments for PG_CACHE_LINE_SIZE.
Actually, the word only is removed (because PG_CACHE_LINE_SIZE has a
new
On Tue, Dec 2, 2014 at 1:00 PM, Robert Haas robertmh...@gmail.com wrote:
I'd prefer not to have a #define in pg_config_manual.h. Only stuff
that we expect a reasonably decent number of users to need to change
should be in that file, and this is too marginal for that. If anybody
other than
On Tue, Dec 2, 2014 at 4:21 PM, Peter Geoghegan p...@heroku.com wrote:
I'm not sure about that. I'd prefer to have tuplesort (and one or two
other sites) set the abbreviation is possible in principle flag.
Otherwise, sortsupport needs to assume that the leading attribute is
going to be the
On Tue, Dec 2, 2014 at 2:07 PM, Robert Haas robertmh...@gmail.com wrote:
Well, maybe you should make the updates we've agreed on and I can take
another look at it.
Agreed.
But I didn't think that I was proposing to change
anything about the level at which the decision about whether to
On Tue, Dec 2, 2014 at 5:16 PM, Peter Geoghegan p...@heroku.com wrote:
On Tue, Dec 2, 2014 at 2:07 PM, Robert Haas robertmh...@gmail.com wrote:
Well, maybe you should make the updates we've agreed on and I can take
another look at it.
Agreed.
But I didn't think that I was proposing to
On Tue, Dec 2, 2014 at 2:21 PM, Robert Haas robertmh...@gmail.com wrote:
Right, and what I'm saying is that maybe the applicability flag
shouldn't be stored in the SortSupport object, but passed down as an
argument.
But then how does that information get to any given sortsupport
routine?
Peter Geoghegan p...@heroku.com writes:
On Tue, Dec 2, 2014 at 2:21 PM, Robert Haas robertmh...@gmail.com wrote:
Right, and what I'm saying is that maybe the applicability flag
shouldn't be stored in the SortSupport object, but passed down as an
argument.
But then how does that information
On Tue, Dec 2, 2014 at 2:16 PM, Peter Geoghegan p...@heroku.com wrote:
On Tue, Dec 2, 2014 at 2:07 PM, Robert Haas robertmh...@gmail.com wrote:
Well, maybe you should make the updates we've agreed on and I can take
another look at it.
Agreed.
Attached, revised patchset makes these updates. I
On Tue, Dec 2, 2014 at 5:28 PM, Peter Geoghegan p...@heroku.com wrote:
Attached, revised patchset makes these updates.
Whoops. Missed some obsolete comments. Here is a third commit that
makes a further small modification to one comment.
--
Peter Geoghegan
From
On Sun, Nov 9, 2014 at 10:02 PM, Peter Geoghegan p...@heroku.com wrote:
On Sat, Oct 11, 2014 at 6:34 PM, Peter Geoghegan p...@heroku.com wrote:
Attached patch, when applied, accelerates all tuplesort cases using
abbreviated keys, building on previous work here, as well as the patch
posted to
On Tue, Nov 25, 2014 at 4:01 AM, Robert Haas robertmh...@gmail.com wrote:
- This appears to needlessly reindent the comments for PG_CACHE_LINE_SIZE.
Actually, the word only is removed (because PG_CACHE_LINE_SIZE has a
new client now). So it isn't quite the same paragraph as before.
- I really
On Tue, Nov 25, 2014 at 10:38 AM, Peter Geoghegan p...@heroku.com wrote:
- Also, I don't think making abbrev_state an enumerated value with two
values is really doing anything for us; we could just use a Boolean.
I'm wondering if we should actually go a bit further and remove this
from the
On Sat, Oct 11, 2014 at 6:34 PM, Peter Geoghegan p...@heroku.com wrote:
Attached patch, when applied, accelerates all tuplesort cases using
abbreviated keys, building on previous work here, as well as the patch
posted to that other thread.
I attach an updated patch set, rebased on top of the
On Mon, Sep 29, 2014 at 10:34 PM, Peter Geoghegan p...@heroku.com wrote:
single sortsupport state patch.
You probably noticed that I posted an independently useful patch to
make all tuplesort cases use sortsupport [1] - currently, both the
B-Tree and CLUSTER cases do not use the sortsupport
On Wed, Sep 24, 2014 at 7:04 PM, Peter Geoghegan p...@heroku.com wrote:
On Fri, Sep 19, 2014 at 2:54 PM, Peter Geoghegan p...@heroku.com wrote:
Probably not - it appears to make very little difference to
unoptimized pass-by-reference types whether or not datum1 can be used
(see my simulation
On Thu, Sep 25, 2014 at 9:21 AM, Robert Haas robertmh...@gmail.com wrote:
The top issue on my agenda is figuring out a way to get rid of the
extra SortSupport object.
Really? I'm surprised. Clearly the need to restart heap tuple copying
from scratch, in order to make the datum1 representation
On Thu, Sep 25, 2014 at 2:05 PM, Peter Geoghegan p...@heroku.com wrote:
On Thu, Sep 25, 2014 at 9:21 AM, Robert Haas robertmh...@gmail.com wrote:
The top issue on my agenda is figuring out a way to get rid of the
extra SortSupport object.
Really? I'm surprised. Clearly the need to restart
On Thu, Sep 25, 2014 at 11:53 AM, Robert Haas robertmh...@gmail.com wrote:
I haven't looked at that part of the patch in detail yet, so... not
really. But I don't see why you'd ever need to restart heap tuple
copying. At most you'd need to re-extract datum1 from the tuples you
have already
On Thu, Sep 25, 2014 at 3:17 PM, Peter Geoghegan p...@heroku.com wrote:
To find out how much that optimization buys, you
should use tuples with many variable-length columns (say, 50)
preceding the text column you're sorting on. I won't be surprised if
that turns out to be expensive enough to
On Fri, Sep 19, 2014 at 2:54 PM, Peter Geoghegan p...@heroku.com wrote:
Probably not - it appears to make very little difference to
unoptimized pass-by-reference types whether or not datum1 can be used
(see my simulation of Kevin's worst case, for example [1]). Streaming
through a not
On Tue, Sep 16, 2014 at 4:55 PM, Peter Geoghegan p...@heroku.com wrote:
On Tue, Sep 16, 2014 at 1:45 PM, Robert Haas robertmh...@gmail.com wrote:
Even though our testing seems to indicate that the memcmp() is
basically free, I think it would be good to make the effort to avoid
doing memcmp()
On Fri, Sep 19, 2014 at 9:59 AM, Robert Haas robertmh...@gmail.com wrote:
OK, good point. So committed as-is, then, except that I rewrote the
comments, which I felt were excessively long for the amount of code.
Thanks!
I look forward to hearing your thoughts on the open issues with the
patch
On Thu, Sep 11, 2014 at 8:34 PM, Peter Geoghegan p...@heroku.com wrote:
On Tue, Sep 9, 2014 at 2:25 PM, Robert Haas robertmh...@gmail.com wrote:
I like that I don't have to care about every combination, and can
treat abbreviation abortion as the special case with the extra step,
in line with
On Fri, Sep 19, 2014 at 2:35 PM, Robert Haas robertmh...@gmail.com wrote:
Also, shouldn't you go back and fix up
those abbreviated keys to point to datum1 again if you abort?
Probably not - it appears to make very little difference to
unoptimized pass-by-reference types whether or not datum1
On Mon, Sep 15, 2014 at 7:21 PM, Peter Geoghegan p...@heroku.com wrote:
On Mon, Sep 15, 2014 at 11:25 AM, Peter Geoghegan p...@heroku.com wrote:
OK, I'll draft a patch for that today, including similar alterations
to varstr_cmp() for the benefit of Windows and so on.
I attach a much simpler
On Tue, Sep 16, 2014 at 1:45 PM, Robert Haas robertmh...@gmail.com wrote:
Even though our testing seems to indicate that the memcmp() is
basically free, I think it would be good to make the effort to avoid
doing memcmp() and then strcoll() and then strncmp(). Seems like it
shouldn't be too
On 09/14/2014 11:34 PM, Peter Geoghegan wrote:
On Sun, Sep 14, 2014 at 7:37 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
Both values vary in range 5.9 - 6.1 s, so it's fair to say that the useless
memcmp() is free with these parameters.
Is this the worst case scenario?
Other than
On Sun, Sep 14, 2014 at 10:37 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
On 09/13/2014 11:28 PM, Peter Geoghegan wrote:
Anyway, attached rough test program implements what you outline. This
is for 30,000 32 byte strings (where just the final two bytes differ).
On my laptop, output
On Mon, Sep 15, 2014 at 10:17 AM, Robert Haas robertmh...@gmail.com wrote:
It strikes me that perhaps we should make this change (rearranging
things so that the memcmp tiebreak is run before strcoll) first,
before dealing with the rest of the abbreviated keys infrastructure.
It appears to be a
On Mon, Sep 15, 2014 at 1:34 PM, Peter Geoghegan p...@heroku.com wrote:
On Mon, Sep 15, 2014 at 10:17 AM, Robert Haas robertmh...@gmail.com wrote:
It strikes me that perhaps we should make this change (rearranging
things so that the memcmp tiebreak is run before strcoll) first,
before dealing
On Mon, Sep 15, 2014 at 10:53 AM, Robert Haas robertmh...@gmail.com wrote:
I think there's probably more than that to work out, but in any case
there's no harm in getting a simple optimization done first before
moving on to a complicated one.
I guess we never talked about the abort logic in
On Mon, Sep 15, 2014 at 1:55 PM, Peter Geoghegan p...@heroku.com wrote:
On Mon, Sep 15, 2014 at 10:53 AM, Robert Haas robertmh...@gmail.com wrote:
I think there's probably more than that to work out, but in any case
there's no harm in getting a simple optimization done first before
moving on
On Mon, Sep 15, 2014 at 11:20 AM, Robert Haas robertmh...@gmail.com wrote:
...looks like about a 10-line patch. We have the data to show that
the loss is trivial even in the worst case, and we have or should be
able to get data showing that the best-case win is significant even
without the
On Mon, Sep 15, 2014 at 11:25 AM, Peter Geoghegan p...@heroku.com wrote:
OK, I'll draft a patch for that today, including similar alterations
to varstr_cmp() for the benefit of Windows and so on.
I attach a much simpler patch, that only adds an opportunistic
memcmp() == 0 before a possible
On Mon, Sep 15, 2014 at 4:21 PM, Peter Geoghegan p...@heroku.com wrote:
I attach a much simpler patch, that only adds an opportunistic
memcmp() == 0 before a possible strcoll(). Both
bttextfastcmp_locale() and varstr_cmp() have the optimization added,
since there is no point in leaving anyone
On 09/13/2014 11:28 PM, Peter Geoghegan wrote:
Anyway, attached rough test program implements what you outline. This
is for 30,000 32 byte strings (where just the final two bytes differ).
On my laptop, output looks like this (edited to only show median
duration in each case):
Got to be careful
On Sun, Sep 14, 2014 at 7:37 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
Got to be careful to not let the compiler optimize away microbenchmarks like
this. At least with my version of gcc, the strcoll calls get optimized away,
as do the memcmp calls, if you don't use the result for
On Fri, Sep 12, 2014 at 11:38 AM, Robert Haas robertmh...@gmail.com wrote:
Based on discussion thus far it seems that there's a possibility that
the trade-off may be different for short strings vs. long strings. If
the string is small enough to fit in the L1 CPU cache, then it may be
that
On 09/12/2014 12:46 AM, Peter Geoghegan wrote:
On Thu, Sep 11, 2014 at 1:50 PM, Robert Haas robertmh...@gmail.com wrote:
I think I said pretty clearly that it was.
I agree that you did, but it wasn't clear exactly what factors you
were asking me to simulate.
All factors.
Do you want me to
On Fri, Sep 12, 2014 at 5:28 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
On 09/12/2014 12:46 AM, Peter Geoghegan wrote:
On Thu, Sep 11, 2014 at 1:50 PM, Robert Haas robertmh...@gmail.com
wrote:
I think I said pretty clearly that it was.
I agree that you did, but it wasn't clear
On Fri, Sep 12, 2014 at 11:38 AM, Robert Haas robertmh...@gmail.com wrote:
Based on discussion thus far it seems that there's a possibility that
the trade-off may be different for short strings vs. long strings. If
the string is small enough to fit in the L1 CPU cache, then it may be
that
On Fri, Sep 12, 2014 at 2:58 PM, Peter Geoghegan p...@heroku.com wrote:
On Fri, Sep 12, 2014 at 11:38 AM, Robert Haas robertmh...@gmail.com wrote:
Based on discussion thus far it seems that there's a possibility that
the trade-off may be different for short strings vs. long strings. If
the
On Fri, Sep 12, 2014 at 12:02 PM, Robert Haas robertmh...@gmail.com wrote:
I think I've said a few times now that I really want to get this
additional data before forming an opinion. As a certain Mr. Doyle
writes, It is a capital mistake to theorize before one has data.
Insensibly one begins
On Wed, Sep 10, 2014 at 11:36 AM, Robert Haas robertmh...@gmail.com wrote:
No, not really. All you have to do is right a little test program to
gather the information.
I don't think a little test program is useful - IMV it's too much of a
simplification to suppose that a strcoll() has a fixed
On Thu, Sep 11, 2014 at 4:13 PM, Peter Geoghegan p...@heroku.com wrote:
On Wed, Sep 10, 2014 at 11:36 AM, Robert Haas robertmh...@gmail.com wrote:
No, not really. All you have to do is right a little test program to
gather the information.
I don't think a little test program is useful - IMV
On Thu, Sep 11, 2014 at 1:50 PM, Robert Haas robertmh...@gmail.com wrote:
I think I said pretty clearly that it was.
I agree that you did, but it wasn't clear exactly what factors you
were asking me to simulate. It still isn't. Do you want me to compare
the same string a million times in a loop,
On Tue, Sep 9, 2014 at 2:25 PM, Robert Haas robertmh...@gmail.com wrote:
I like that I don't have to care about every combination, and can
treat abbreviation abortion as the special case with the extra step,
in line with how I think of the optimization conceptually. Does that
make sense?
No.
On Tue, Sep 9, 2014 at 2:00 PM, Robert Haas robertmh...@gmail.com wrote:
Boiled down, what you're saying is that you might have a set that
contains lots of duplicates in general, but not very many where the
abbreviated-keys also match. Sure, that's true.
Abbreviated keys are not used in the
On Wed, Sep 10, 2014 at 1:36 PM, Peter Geoghegan p...@heroku.com wrote:
In order to know how much we're
giving up in that case, we need the exact number I asked you to
provide in my previous email: the ratio of the cost of strcoll() to
the cost of memcmp().
I see that you haven't chosen to
On Thu, Sep 4, 2014 at 5:46 PM, Peter Geoghegan p...@heroku.com wrote:
On Thu, Sep 4, 2014 at 2:18 PM, Robert Haas robertmh...@gmail.com wrote:
Eh, maybe? I'm not sure why the case where we're using abbreviated
keys should be different than the case we're not. In either case this
is a
On Fri, Sep 5, 2014 at 10:45 PM, Peter Geoghegan p...@heroku.com wrote:
While I gave serious consideration to your idea of having a dedicated
abbreviation comparator, and not duplicating sortsupport state when
abbreviated keys are used (going so far as to almost fully implement
the idea), I
On Fri, Sep 5, 2014 at 7:45 PM, Peter Geoghegan p...@heroku.com wrote:
Attached additional patches are intended to be applied on top off most
of the patches posted on September 2nd [1].
I attach another amendment/delta patch, intended to be applied on top
of what was posted yesterday. I
On Sat, Sep 6, 2014 at 3:01 PM, Peter Geoghegan p...@heroku.com wrote:
I attach another amendment/delta patch
Attached is another amendment to the patch set. With the recent
addition of abbreviation support on 32-bit platforms, we should just
hash the Datum representation as a uint32 on
On Wed, Sep 3, 2014 at 2:44 PM, Peter Geoghegan p...@heroku.com wrote:
I guess it should still be a configure option, then. Or maybe there
should just be a USE_ABBREV_KEYS macro within pg_config_manual.h.
Attached additional patches are intended to be applied on top off most
of the patches
On Tue, Sep 2, 2014 at 10:27 PM, Peter Geoghegan p...@heroku.com wrote:
* Still doesn't address the open question of whether or not we should
optimistically always try memcmp() == 0 on tiebreak. I still lean
towards yes.
Let m be the cost of a memcmp() that fails near the end of the
strings;
On Wed, Sep 3, 2014 at 5:44 PM, Peter Geoghegan p...@heroku.com wrote:
On Wed, Sep 3, 2014 at 2:18 PM, Robert Haas robertmh...@gmail.com wrote:
My suggestion is to remove the special cases for Darwin and 32-bit
systems and see how it goes.
I guess it should still be a configure option, then.
On Thu, Sep 4, 2014 at 9:19 AM, Robert Haas robertmh...@gmail.com wrote:
On Tue, Sep 2, 2014 at 10:27 PM, Peter Geoghegan p...@heroku.com wrote:
* Still doesn't address the open question of whether or not we should
optimistically always try memcmp() == 0 on tiebreak. I still lean
towards yes.
On Thu, Sep 4, 2014 at 2:12 PM, Peter Geoghegan p...@heroku.com wrote:
On Thu, Sep 4, 2014 at 9:19 AM, Robert Haas robertmh...@gmail.com wrote:
On Tue, Sep 2, 2014 at 10:27 PM, Peter Geoghegan p...@heroku.com wrote:
* Still doesn't address the open question of whether or not we should
On Thu, Sep 4, 2014 at 2:18 PM, Robert Haas robertmh...@gmail.com wrote:
Eh, maybe? I'm not sure why the case where we're using abbreviated
keys should be different than the case we're not. In either case this
is a straightforward trade-off: if we do a memcmp() before strcoll(),
we win if it
On Thu, Sep 4, 2014 at 11:12 AM, Peter Geoghegan p...@heroku.com wrote:
What I
consider an open question is whether or not we should do that on the
first call when there is no abbreviated comparison, such as on the
second or subsequent attribute in a multi-column sort, in the hope
that
On Thu, Sep 4, 2014 at 5:07 PM, Peter Geoghegan p...@heroku.com wrote:
So I came up with what I imagined to be an unsympathetic case:
BTW, this cities data is still available from:
http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/data/cities.dump
--
Peter Geoghegan
--
Sent via
On Tue, Sep 2, 2014 at 4:41 PM, Peter Geoghegan p...@heroku.com wrote:
HyperLogLog isn't sample-based - it's useful for streaming a set and
accurately tracking its cardinality with fixed overhead.
OK.
Is it the right decision to suppress the abbreviated-key optimization
unconditionally on
1 - 100 of 235 matches
Mail list logo