On Wed, Jul 15, 2015 at 12:31 AM, Jeff Janes wrote:
> On Tue, Jul 7, 2015 at 6:33 AM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>>
>>
>> See Tom Lane's comment about downgrade scripts. I think just remove it is
>> a right solution.
>>
>
> The new patch removes the downgrade path
On Tue, Jul 7, 2015 at 6:33 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:
>
>
> See Tom Lane's comment about downgrade scripts. I think just remove it is
> a right solution.
>
The new patch removes the downgrade path and the ability to install the old
version.
(If anyone wants an ea
On Tue, Jun 30, 2015 at 11:28 PM, Jeff Janes wrote:
> On Tue, Jun 30, 2015 at 2:46 AM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> On Sun, Jun 28, 2015 at 1:17 AM, Jeff Janes wrote:
>>
>>> This patch implements version 1.2 of contrib module pg_trgm.
>>>
>>> This supports the tr
Jeff Janes writes:
> On Tue, Jun 30, 2015 at 2:46 AM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>> pg_trgm--1.1.sql andpg_trgm--1.1--1.2.sql are useful for debug, but do you
>> expect them in final commit? As I can see in other contribs we have only
>> last version and upgrade scrip
On Tue, Jun 30, 2015 at 2:46 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:
> On Sun, Jun 28, 2015 at 1:17 AM, Jeff Janes wrote:
>
>> This patch implements version 1.2 of contrib module pg_trgm.
>>
>> This supports the triconsistent function, introduced in version 9.4 of
>> the server
On Sun, Jun 28, 2015 at 1:17 AM, Jeff Janes wrote:
> This patch implements version 1.2 of contrib module pg_trgm.
>
> This supports the triconsistent function, introduced in version 9.4 of the
> server, to make it faster to implement indexed queries where some keys are
> common and some are rare.
On Mon, Jun 29, 2015 at 7:23 AM, Merlin Moncure wrote:
> On Sat, Jun 27, 2015 at 5:17 PM, Jeff Janes wrote:
>> V1.1: Time: 1743.691 ms --- after repeated execution to warm the cache
>>
>> V1.2: Time: 2.839 ms --- after repeated execution to warm the cache
>
> Wow! I'm going to test this.
On Sat, Jun 27, 2015 at 5:17 PM, Jeff Janes wrote:
> This patch implements version 1.2 of contrib module pg_trgm.
>
> This supports the triconsistent function, introduced in version 9.4 of the
> server, to make it faster to implement indexed queries where some keys are
> common and some are rare.
Hello,
> If you manually set RPADDING 2 in trgm.h, then it will, but the
> allocation probably should use LPADDING/RPADDING to get it right, rather
> than assume the max values.
Yes you are right. For RPADDING = 2, the current formula is suitable but for
RPADDING =1, a lot of extra space is all
On 03/09/2015 03:33 PM, Tom Lane wrote:
Beena Emerson writes:
In the pg_trgm module, within function generate_trgm, the memory for trigrams
is allocated as follows:
trg = (TRGM *) palloc(TRGMHDRSIZE + sizeof(trgm) * (slen / 2 + 1) *3);
I have been trying to understand why this is so becau
Beena Emerson writes:
> In the pg_trgm module, within function generate_trgm, the memory for trigrams
> is allocated as follows:
> trg = (TRGM *) palloc(TRGMHDRSIZE + sizeof(trgm) * (slen / 2 + 1) *3);
> I have been trying to understand why this is so because it seems to be
> allocating more spa
On 03/09/2015 02:54 PM, Alvaro Herrera wrote:
Beena Emerson wrote:
In the pg_trgm module, within function generate_trgm, the memory for trigrams
is allocated as follows:
trg = (TRGM *) palloc(TRGMHDRSIZE + sizeof(trgm) * (slen / 2 + 1) *3);
I have been trying to understand why this is so becau
Beena Emerson wrote:
> In the pg_trgm module, within function generate_trgm, the memory for trigrams
> is allocated as follows:
>
> trg = (TRGM *) palloc(TRGMHDRSIZE + sizeof(trgm) * (slen / 2 + 1) *3);
>
> I have been trying to understand why this is so because it seems to be
> allocating more s
On Fri, Nov 23, 2012 at 2:11 AM, Fujii Masao wrote:
> On Mon, Nov 19, 2012 at 10:56 AM, Tomas Vondra wrote:
>> I've done a quick review of the current patch:
>
> Thanks for the commit!
>
> As Alexander pointed out upthread, another infrastructure patch is required
> before applying this patch. So
On Mon, Nov 19, 2012 at 10:56 AM, Tomas Vondra wrote:
> I've done a quick review of the current patch:
Thanks for the commit!
As Alexander pointed out upthread, another infrastructure patch is required
before applying this patch. So I will implement the infra patch first.
Regards,
--
Fujii Ma
On Mon, Nov 19, 2012 at 7:55 PM, Alexander Korotkov
wrote:
> On Mon, Nov 19, 2012 at 10:05 AM, Alexander Korotkov
> wrote:
>>
>> On Thu, Nov 15, 2012 at 11:39 PM, Fujii Masao
>> wrote:
>>>
>>> Note that we cannot do a partial-match if KEEPONLYALNUM is disabled,
>>> i.e., if query key contains mu
On Mon, Nov 19, 2012 at 10:05 AM, Alexander Korotkov
wrote:
> On Thu, Nov 15, 2012 at 11:39 PM, Fujii Masao wrote:
>
>> Note that we cannot do a partial-match if KEEPONLYALNUM is disabled,
>> i.e., if query key contains multibyte characters. In this case, byte
>> length of
>> the trigram string mi
Hi!
On Thu, Nov 15, 2012 at 11:39 PM, Fujii Masao wrote:
> Note that we cannot do a partial-match if KEEPONLYALNUM is disabled,
> i.e., if query key contains multibyte characters. In this case, byte
> length of
> the trigram string might be larger than three, and its CRC is used as a
> trigram k
On 15.11.2012 20:39, Fujii Masao wrote:
> Hi,
>
> I'd like to propose to extend pg_trgm so that it can compare a partial-match
> query key to a GIN index. IOW, I'm thinking to implement the 'comparePartial'
> GIN method for pg_trgm.
>
> Currently, when the query key is less than three characters,
On Tue, Jun 14, 2011 at 1:15 AM, Tom Lane wrote:
> I'm not sure that pg_upgrade is a good vehicle for dispensing such
> advice, anyway. At least in the Red Hat packaging, end users will never
> read what it prints, unless maybe it fails outright and they're trying
> to debug why.
In my experienc
On Jun14, 2011, at 07:15 , Tom Lane wrote:
> Robert Haas writes:
>> On Mon, Jun 13, 2011 at 7:47 PM, Bruce Momjian wrote:
>>> No, it does not. Under what circumstances should I issue a suggestion
>>> to reindex, and what should the text be?
>
>> It sounds like GIN indexes need to be reindexed a
Robert Haas writes:
> On Mon, Jun 13, 2011 at 7:47 PM, Bruce Momjian wrote:
>> No, it does not. Under what circumstances should I issue a suggestion
>> to reindex, and what should the text be?
> It sounds like GIN indexes need to be reindexed after upgrading from <
> 9.1 to >= 9.1.
Only if you
Robert Haas wrote:
> On Mon, Jun 13, 2011 at 7:47 PM, Bruce Momjian wrote:
> > Robert Haas wrote:
> >> On Sun, Jun 12, 2011 at 8:40 AM, Florian Pflug wrote:
> >> > Note that this restriction was removed in postgres 9.1 which
> >> > is currently in beta. However, GIT indices must be re-created
> >
On Mon, Jun 13, 2011 at 7:47 PM, Bruce Momjian wrote:
> Robert Haas wrote:
>> On Sun, Jun 12, 2011 at 8:40 AM, Florian Pflug wrote:
>> > Note that this restriction was removed in postgres 9.1 which
>> > is currently in beta. However, GIT indices must be re-created
>> > with REINDEX after upgradin
Robert Haas wrote:
> On Sun, Jun 12, 2011 at 8:40 AM, Florian Pflug wrote:
> > Note that this restriction was removed in postgres 9.1 which
> > is currently in beta. However, GIT indices must be re-created
> > with REINDEX after upgrading from 9.0 to leverage that
> > improvement.
>
> Does pg_upg
On Sun, Jun 12, 2011 at 8:40 AM, Florian Pflug wrote:
> Note that this restriction was removed in postgres 9.1 which
> is currently in beta. However, GIT indices must be re-created
> with REINDEX after upgrading from 9.0 to leverage that
> improvement.
Does pg_upgrade know about this?
--
Robert
Hi
Next time, please post questions regarding the usage of postgres
to the -general list, not to -hackers. The purpose of -hackers is
to discuss the development of postgres proper, not the development
of applications using postgres.
On Jun12, 2011, at 13:33 , Sushant Sinha wrote:
> I am using pg_
Greg Stark writes:
> There seem to be three behaviours on the table here:
You're neglecting
4) Let the user decide whether he wants pg_trgm to consider word
elements to be "alphanumerics" or "any non-space".
The main problem I have with Tatsuo's patch is that it forecloses any
reasonably upward
On Sun, May 30, 2010 at 3:41 PM, Tom Lane wrote:
> I don't think it's unreasonable to insist that behavioral changes be
> made in an upward compatible fashion ... especially ones that seem as
> least as likely to break some current usages as to enable new usages.
Fwiw I don't think we've traditio
> > > This is in 9.0, because 8.4 doesn't recognize the \u escape syntax. If
> > > you run this in 8.4, you're just comparing a sequence of ASCII letters
> > > and digits.
> >
> > Hum. Still I prefer 8.4's behavior since anything is better than
> > returning NaN. It seems 9.0 does not have any es
Tatsuo Ishii writes:
>> This is still ignoring the point: arbitrarily changing the module's
>> longstanding standard behavior isn't acceptable. You need to provide
>> a way for the user to control the behavior. (Once you've done that,
>> I think it can be just either "alnum" or "!isspace", but m
On sön, 2010-05-30 at 11:05 +0900, Tatsuo Ishii wrote:
> > > Wait. This works fine for me with stock pg_trgm. local is C and
> > > encoding is UTF8. What version of PostgreSQL are you using? Mine is
> > > 8.4.4.
> >
> > This is in 9.0, because 8.4 doesn't recognize the \u escape syntax. If
> > yo
> > Wait. This works fine for me with stock pg_trgm. local is C and
> > encoding is UTF8. What version of PostgreSQL are you using? Mine is
> > 8.4.4.
>
> This is in 9.0, because 8.4 doesn't recognize the \u escape syntax. If
> you run this in 8.4, you're just comparing a sequence of ASCII letter
> This is still ignoring the point: arbitrarily changing the module's
> longstanding standard behavior isn't acceptable. You need to provide
> a way for the user to control the behavior. (Once you've done that,
> I think it can be just either "alnum" or "!isspace", but maybe some
> other behavior
Tatsuo Ishii writes:
> After thinking a little bit more, I think following patch would not
> break existing behavior and also adopts mutibyte + C locale case. What
> do you think?
This is still ignoring the point: arbitrarily changing the module's
longstanding standard behavior isn't acceptable.
On Sat, May 29, 2010 at 9:13 AM, Tatsuo Ishii wrote:
> ! #define iswordchr(c) (lc_ctype_is_c()? \
> ! ((*(c) &
> 0x80)? !t_isspace(c) : (t_isalpha(c) || t_isdigit(c))) : \
>
Surely isspace(c) will always be false for non-ascii charac
> > It's not a practical solution for people working with prebuilt Postgres
> > versions, which is most people. I don't object to finding a way to
> > provide a "not-space" behavior instead of an "is-alnum" behavior,
> > but as noted upthread a GUC isn't the right way. How do you feel
> > about a
On fre, 2010-05-28 at 10:04 +0900, Tatsuo Ishii wrote:
> > I think the problem at hand has nothing at all to do with agglutination
> > or CJK-specific issues. You will get the same problem with other
> > languages *if* you set a locale that does not adequately support the
> > characters in use. E
> It's not a practical solution for people working with prebuilt Postgres
> versions, which is most people. I don't object to finding a way to
> provide a "not-space" behavior instead of an "is-alnum" behavior,
> but as noted upthread a GUC isn't the right way. How do you feel
> about a new set o
> I think the problem at hand has nothing at all to do with agglutination
> or CJK-specific issues. You will get the same problem with other
> languages *if* you set a locale that does not adequately support the
> characters in use. E.g., Russian with locale C and encoding UTF8:
>
> select simil
> Tatsuo Ishii writes:
> > similarity -> generate_trgm -> find_word -> iswordchr -> t_isalpha ->
> > isalpha
>
> > if locale is C and USE_WIDE_UPPER_LOWER defined which is the case in
> > most modern OSs.
>
> Quite. And *if locale is C then only standard ASCII letters are letters*.
> You may n
Tatsuo Ishii writes:
> Or you could just #undef KEEPONLYALNUM in trgm.h. But I'm not sure
> this is the right thing for you.
It's not a practical solution for people working with prebuilt Postgres
versions, which is most people. I don't object to finding a way to
provide a "not-space" behavior i
Tatsuo Ishii writes:
> similarity -> generate_trgm -> find_word -> iswordchr -> t_isalpha -> isalpha
> if locale is C and USE_WIDE_UPPER_LOWER defined which is the case in
> most modern OSs.
Quite. And *if locale is C then only standard ASCII letters are letters*.
You may not like that but it's
> > Problem with pg_trgm is, it uses isascii() etc. to recognize a letter,
> > which will skip any non ASCII range character in C locale.
>
> The only place I see that is in those ISPRINTABLE macros, which are only
> used in show_trgm(), which is just a debugging function. It could stand
> to be
Tatsuo Ishii writes:
> Problem with pg_trgm is, it uses isascii() etc. to recognize a letter,
> which will skip any non ASCII range character in C locale.
The only place I see that is in those ISPRINTABLE macros, which are only
used in show_trgm(), which is just a debugging function. It could st
> What I can't help wondering as I'm reading this discussion is -
> Tatsuo-san said upthread that he has a problem with pg_trgm that he
> does not have with full text search. So what is full text search
> doing differently than pg_trgm?
Problem with pg_trgm is, it uses isascii() etc. to recognize
> I think the problem at hand has nothing at all to do with agglutination
> or CJK-specific issues. You will get the same problem with other
> languages *if* you set a locale that does not adequately support the
> characters in use. E.g., Russian with locale C and encoding UTF8:
>
> select simil
On Thu, May 27, 2010 at 2:01 PM, Peter Eisentraut wrote:
> On fre, 2010-05-28 at 00:46 +0900, Tatsuo Ishii wrote:
>> > I don't know about Japanese, but the locale approach works just fine for
>> > other agglutinative languages. I would rather suspect that it is the
>> > trigram approach that migh
On fre, 2010-05-28 at 00:46 +0900, Tatsuo Ishii wrote:
> > I don't know about Japanese, but the locale approach works just fine for
> > other agglutinative languages. I would rather suspect that it is the
> > trigram approach that might be rather useless for such languages,
> > because you are goi
> So I think a GUC is broken because pg_tgrm has a index opclasses and
> any indexes built using one setting will be broken if the GUC is
> changed.
>
> Perhaps we need two sets of functions (which presumably call the same
> implementation with a flag to indicate which definition to use). Then
> y
> I don't know about Japanese, but the locale approach works just fine for
> other agglutinative languages. I would rather suspect that it is the
> trigram approach that might be rather useless for such languages,
> because you are going to get a lot of similarity hits for the affixes.
I'm not su
On tor, 2010-05-27 at 23:20 +0900, Tatsuo Ishii wrote:
> Anyway locale is completely usesless for finding word vs non-character
> an agglutinative language such as Japanese.
I don't know about Japanese, but the locale approach works just fine for
other agglutinative languages. I would rather susp
On Thu, May 27, 2010 at 3:52 PM, Tom Lane wrote:
> I think a more appropriate type of fix would be to expose the
> KEEPONLYALNUM option as a GUC, or some other way of letting the
> user decide what he wants.
>
So I think a GUC is broken because pg_tgrm has a index opclasses and
any indexes built
Tatsuo Ishii writes:
> ! #define iswordchr(c)(t_isalpha(c) || t_isdigit(c) ||
> (lc_ctype_is_c() && !t_isspace(c)))
This seems entirely arbitrary. It might "fix" things in your view
but it will break the longstanding behavior for other people.
I think a more appropriate type of fix wou
> Well, that doesn't mean that the answer is to use C locale ;-)
Of course it's up to user whether to use C locale or not. I just want
pg_trgm work with C locale as well.
> However, you could possibly think about making this bit of code
> more flexible:
>
> #ifdef KEEPONLYALNUM
> #define iswordc
Tatsuo Ishii writes:
> Anyway locale is completely usesless for finding word vs non-character
> an agglutinative language such as Japanese.
Well, that doesn't mean that the answer is to use C locale ;-)
However, you could possibly think about making this bit of code
more flexible:
#ifdef KEEPON
> Exactly what do you consider to be the missing functionality?
> You need a notion of word vs non-word character from somewhere,
> and the locale setting is the standard place to get that. The
> core text search functionality behaves the same way.
No. Text search works fine with multibyte + C lo
Tatsuo Ishii writes:
>> It's not a problem, it's just pilot error, or possibly inadequate
>> documentation. pg_trgm uses the locale's definition of "alpha",
>> "digit", etc. In C locale only basic ASCII letters and digits will be
>> recognized as word constituents.
> That means there is no chan
> > Yes, pg_trgm seems to have problems with multibyte + C locale.
>
> It's not a problem, it's just pilot error, or possibly inadequate
> documentation. pg_trgm uses the locale's definition of "alpha",
> "digit", etc. In C locale only basic ASCII letters and digits will be
> recognized as word
Tatsuo Ishii writes:
> What is your locale?
>> It was en_EN.UTF-8. Interesting. With C it fails...
> Yes, pg_trgm seems to have problems with multibyte + C locale.
It's not a problem, it's just pilot error, or possibly inadequate
documentation. pg_trgm uses the locale's definition of "alpha",
"
> > What is your locale?
> It was en_EN.UTF-8. Interesting. With C it fails...
Yes, pg_trgm seems to have problems with multibyte + C locale.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
--
Sent via pgsql-hackers mailing list
On Thursday 27 May 2010 14:40:41 Tatsuo Ishii wrote:
> > > No, it doesn't.
> > > Encoding is EUC_JP, locale is C. Included is the script to reproduce
> > > the problem.
> >
> > test=# select show_trgm('日本語');
> >
> > show_trgm
> >
> > ---
> >
> > No, it doesn't.
> > Encoding is EUC_JP, locale is C. Included is the script to reproduce
> > the problem.
> test=# select show_trgm('日本語');
> show_trgm
> ---
> {0x8194c0,0x836e53,0x1dc363,0x1e22e9}
> (1 row)
>
> Time: 0.44
Hi,
On Thursday 27 May 2010 13:53:37 Tatsuo Ishii wrote:
> > It's already multibyte safe since 8.4
>
> No, it doesn't.
> Encoding is EUC_JP, locale is C. Included is the script to reproduce
> the problem.
test=# select show_trgm('日本語');
show_trgm
--
> It's already multibyte safe since 8.4
No, it doesn't.
$ psql test
Pager usage is off.
psql (8.4.4)
Type "help" for help.
test=# select similarity('abc', 'abd'); -- OK
similarity
0.33
(1 row)
test=# select similarity('日本語', '日本後'); -- NG
similarity
Anyone working on make contrib/pg_trgm mutibyte encoding aware? If
not, I'm interested in the work.
It's already multibyte safe since 8.4
--
Teodor Sigaev E-mail: teo...@sigaev.ru
WWW: http://www.sigaev.ru/
--
66 matches
Mail list logo