Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-19 Thread Martijn van Oosterhout
On Tue, Jan 18, 2011 at 10:03:01AM +0200, Heikki Linnakangas wrote: >> That isn't ever going to happen, unless you'd like to give up hash joins >> and hash aggregation on text values. > > You could canonicalize the string first in the hash function. I'm not > sure if we have all the necessary inf

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-18 Thread Tom Lane
Robert Haas writes: > On Tue, Jan 18, 2011 at 11:44 AM, Tom Lane wrote: >> Oh, I misread Itagaki-san's comment to imply that that *was* in the >> patch.  Maybe I should go read it. > Perhaps. :-) > While you're at it you might commit it. :-) Yeah, as penance I'll take this one.

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-18 Thread Robert Haas
On Tue, Jan 18, 2011 at 11:44 AM, Tom Lane wrote: > Robert Haas writes: >> On Tue, Jan 18, 2011 at 11:15 AM, Tom Lane wrote: >>> No, I don't think so.  Has any evidence been submitted that that part of >>> the patch is of benefit? > >> I think you might be mixing up what's actually in the patch

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-18 Thread Tom Lane
Robert Haas writes: > On Tue, Jan 18, 2011 at 11:15 AM, Tom Lane wrote: >> No, I don't think so.  Has any evidence been submitted that that part of >> the patch is of benefit? > I think you might be mixing up what's actually in the patch with > another idea that was proposed but isn't actually i

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-18 Thread Robert Haas
On Tue, Jan 18, 2011 at 11:15 AM, Tom Lane wrote: >> It's a very light-weight alternative of memcmp the byte data, >> but there is still the same issue -- we might have different >> compressed results if we use different algorithm for TOASTing. > > Which makes it a lightweight waste of cycles. > >

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-18 Thread Tom Lane
Itagaki Takahiro writes: > On Tue, Jan 18, 2011 at 05:39, Tom Lane wrote: >> I haven't looked at this patch, but it seems to me that it would be >> reasonable to conclude A != B if the va_extsize values in the toast >> pointers don't agree. > It's a very light-weight alternative of memcmp the by

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-18 Thread Itagaki Takahiro
On Tue, Jan 18, 2011 at 05:39, Tom Lane wrote: > I haven't looked at this patch, but it seems to me that it would be > reasonable to conclude A != B if the va_extsize values in the toast > pointers don't agree. It's a very light-weight alternative of memcmp the byte data, but there is still the s

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-18 Thread Heikki Linnakangas
On 17.01.2011 22:33, Tom Lane wrote: Peter Eisentraut writes: On mån, 2011-01-17 at 07:35 +0100, Magnus Hagander wrote: In fact, aren't there cases where the *length test* also fails? Currently, two text values are only equal of strcoll() considers them equal and the bits are the same. So

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-17 Thread Peter Eisentraut
On mån, 2011-01-17 at 15:33 -0500, Tom Lane wrote: > Peter Eisentraut writes: > > On mån, 2011-01-17 at 07:35 +0100, Magnus Hagander wrote: > >> In fact, aren't there cases where the *length test* also fails? > > > Currently, two text values are only equal of strcoll() considers them > > equal an

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-17 Thread Noah Misch
On Mon, Jan 17, 2011 at 02:36:56PM -0600, Jim Nasby wrote: > On Jan 17, 2011, at 9:22 AM, Noah Misch wrote: > > Just to be clear, the code already has these length tests today. This patch > > just moves them before the detoast. > > Any reason we can't do this for other varlena? I'm wondering if i

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-17 Thread Tom Lane
Magnus Hagander writes: > I wonder if we can trust the *equality* test, but not the inequality? > E.g. if compressed(A) == compressed(B) we know they're the same, but > if compressed(A) != compressed(B) we don't know they're not they still > might be. I haven't looked at this patch, but it seems

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-17 Thread Jim Nasby
On Jan 17, 2011, at 9:22 AM, Noah Misch wrote: > On Mon, Jan 17, 2011 at 07:35:52AM +0100, Magnus Hagander wrote: >> On Mon, Jan 17, 2011 at 06:51, Itagaki Takahiro >> wrote: >>> On Mon, Jan 17, 2011 at 04:05, Andy Colson wrote: This is a review of: https://commitfest.postgresql.org/act

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-17 Thread Tom Lane
Peter Eisentraut writes: > On mån, 2011-01-17 at 07:35 +0100, Magnus Hagander wrote: >> In fact, aren't there cases where the *length test* also fails? > Currently, two text values are only equal of strcoll() considers them > equal and the bits are the same. So this patch is safe in that regard

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-17 Thread Peter Eisentraut
On mån, 2011-01-17 at 07:55 -0500, Robert Haas wrote: > > There is, however, some desire to loosen this. Possible > applications > > are case-insensitive comparison and Unicode normalization. It's not > > going to happen soon, but it may be worth considering not putting in > an > > optimization t

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-17 Thread Noah Misch
On Mon, Jan 17, 2011 at 11:05:09AM +0100, Magnus Hagander wrote: > On Mon, Jan 17, 2011 at 09:13, Itagaki Takahiro > wrote: > > 2011/1/17 KaiGai Kohei : > >> Are you talking about an idea to apply toast id as an alternative key? > > > > No, probably. I'm just talking about whether "diff -q A.txt B

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-17 Thread Noah Misch
On Mon, Jan 17, 2011 at 07:35:52AM +0100, Magnus Hagander wrote: > On Mon, Jan 17, 2011 at 06:51, Itagaki Takahiro > wrote: > > On Mon, Jan 17, 2011 at 04:05, Andy Colson wrote: > >> This is a review of: > >> https://commitfest.postgresql.org/action/patch_view?id=468 > >> > >> Purpose: > >> =

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-17 Thread Robert Haas
On Mon, Jan 17, 2011 at 2:56 AM, Peter Eisentraut wrote: > On mån, 2011-01-17 at 07:35 +0100, Magnus Hagander wrote: >> For text, I think locales may make that impossible. Aren't there >> locale rules where two different characters can "behave the same" when >> comparing them? I know in Swedish at

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-17 Thread Magnus Hagander
On Mon, Jan 17, 2011 at 09:13, Itagaki Takahiro wrote: > 2011/1/17 KaiGai Kohei : >> Are you talking about an idea to apply toast id as an alternative key? > > No, probably. I'm just talking about whether "diff -q A.txt B.txt" and > "diff -q A.gz  B.gz" always returns the same result or not. > > .

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-17 Thread Itagaki Takahiro
2011/1/17 KaiGai Kohei : > Are you talking about an idea to apply toast id as an alternative key? No, probably. I'm just talking about whether "diff -q A.txt B.txt" and "diff -q A.gz B.gz" always returns the same result or not. ... I found it depends on version of gzip. So, if we use such logic,

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-16 Thread Pavel Stehule
2011/1/17 Itagaki Takahiro : > On Mon, Jan 17, 2011 at 16:13, Pavel Stehule wrote: If we always generate same toasted byte sequences from the same raw values, we don't need to detoast at all to compare the contents. Is it possible or not? >>> >>> For bytea, it seems it would be poss

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-16 Thread Peter Eisentraut
On mån, 2011-01-17 at 07:35 +0100, Magnus Hagander wrote: > For text, I think locales may make that impossible. Aren't there > locale rules where two different characters can "behave the same" when > comparing them? I know in Swedish at least w and v behave the same > when sorting (but not when com

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-16 Thread Itagaki Takahiro
On Mon, Jan 17, 2011 at 16:13, Pavel Stehule wrote: >>> If we always generate same toasted byte sequences from the same raw >>> values, we don't need to detoast at all to compare the contents. >>> Is it possible or not? >> >> For bytea, it seems it would be possible. >> >> For text, I think locale

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-16 Thread Pavel Stehule
2011/1/17 Magnus Hagander : > On Mon, Jan 17, 2011 at 06:51, Itagaki Takahiro > wrote: >> On Mon, Jan 17, 2011 at 04:05, Andy Colson wrote: >>> This is a review of: >>> https://commitfest.postgresql.org/action/patch_view?id=468 >>> >>> Purpose: >>> >>> Equal and not-equal _may_ be quickl

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-16 Thread Magnus Hagander
On Mon, Jan 17, 2011 at 06:51, Itagaki Takahiro wrote: > On Mon, Jan 17, 2011 at 04:05, Andy Colson wrote: >> This is a review of: >> https://commitfest.postgresql.org/action/patch_view?id=468 >> >> Purpose: >> >> Equal and not-equal _may_ be quickly determined if their lengths are >> di

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-16 Thread KaiGai Kohei
(2011/01/17 14:51), Itagaki Takahiro wrote: > On Mon, Jan 17, 2011 at 04:05, Andy Colson wrote: >> This is a review of: >> https://commitfest.postgresql.org/action/patch_view?id=468 >> >> Purpose: >> >> Equal and not-equal _may_ be quickly determined if their lengths are >> different. T

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-16 Thread Itagaki Takahiro
On Mon, Jan 17, 2011 at 04:05, Andy Colson wrote: > This is a review of: > https://commitfest.postgresql.org/action/patch_view?id=468 > > Purpose: > > Equal and not-equal _may_ be quickly determined if their lengths are > different.   This _may_ be a huge speed up if we don't have to deto

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-16 Thread Pavel Stehule
2011/1/16 Noah Misch : > On Sun, Jan 16, 2011 at 10:07:13PM +0100, Pavel Stehule wrote: >> I think, so we can have a function or macro that compare a varlena >> sizes. Some like >> >> Datum texteq(..) >> { >>      if (!datumsHasSameLength(PG_GETARG_DATUM(0), PG_GETARG_DATUM(1)) >>         PG_RETURN

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-16 Thread Noah Misch
On Sun, Jan 16, 2011 at 10:07:13PM +0100, Pavel Stehule wrote: > I think, so we can have a function or macro that compare a varlena > sizes. Some like > > Datum texteq(..) > { > if (!datumsHasSameLength(PG_GETARG_DATUM(0), PG_GETARG_DATUM(1)) > PG_RETURN_FALSE(); > > ... actual

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-16 Thread Noah Misch
On Sun, Jan 16, 2011 at 01:05:11PM -0600, Andy Colson wrote: > This is a review of: > https://commitfest.postgresql.org/action/patch_view?id=468 Thanks! > I created myself a more real world test, with a table with indexes and id's > and a large toasted field. > This will make about 600 records

Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-16 Thread Pavel Stehule
Hello I looked on this patch too. It's good idea. I think, so we can have a function or macro that compare a varlena sizes. Some like Datum texteq(..) { if (!datumsHasSameLength(PG_GETARG_DATUM(0), PG_GETARG_DATUM(1)) PG_RETURN_FALSE(); ... actual code .. } Regards Pavel St