Re: [GENERAL] text column indexing in UTF-8 database

2009-03-13 Thread Tom Lane
Jeff Davis writes: > One thing that still doesn't make sense to me is that texteq() is > bitwise-equality even in 8.3. Historical artifact ... we made the semantics change some time ago, but the ensuing change to remove ~=~ didn't happen until 8.4. > It sounds like Reece Hart can avoid the extra

Re: [GENERAL] text column indexing in UTF-8 database

2009-03-13 Thread Jeff Davis
On Fri, 2009-03-13 at 13:13 -0400, Tom Lane wrote: > There is actually some history here; the former distinction in the > equality operators arose from exactly your concern. But after we > put in the second-pass check to insist on bitwise equality, we > realized that the equality operators really

Re: [GENERAL] text column indexing in UTF-8 database

2009-03-13 Thread Tom Lane
Jeff Davis writes: > It looks like an index using text_pattern_ops can be used for equality > (see my test case below). This is true as of 8.4; prior versions make a distinction between = and ~=~. > This works apparently because texteq() is defined as bitwise-equality. > Is that really correct?

Re: [GENERAL] text column indexing in UTF-8 database

2009-03-12 Thread Reece Hart
On Thu, 2009-03-12 at 17:32 -0700, Steve Atkins wrote: > If A=B then lower(A) = lower(B), and if A like B then lower(A) like > lower(B). > > So, if nothing else, you could rewrite "where alias = 'Foo'" as > "where lower(alias) = lower('Foo') and alias='Foo'" and take advantage > of the lower()

Re: [GENERAL] text column indexing in UTF-8 database

2009-03-12 Thread Reece Hart
On Thu, 2009-03-12 at 18:02 -0700, Jeff Davis wrote: > It looks like an index using text_pattern_ops can be used for equality > (see my test case below). Odd. I can't reproduce your test case. I noticed that I edited out the version and platform from my OP. (A: 8.3.6, x86_64 linux). You're on 8.3.

Re: [GENERAL] text column indexing in UTF-8 database

2009-03-12 Thread Jeff Davis
On Thu, 2009-03-12 at 17:15 -0700, Reece Hart wrote: > Jeff Davis gave me a tip to use text_pattern_ops on indexes to speed up > regexp and like; that worked beautiful. But I discovered a caveat that > t_p_o apparently doesn't handle equality. Thus, I think I need distinct > indexes for the 4 cases

Re: [GENERAL] text column indexing in UTF-8 database

2009-03-12 Thread Steve Atkins
On Mar 12, 2009, at 5:15 PM, Reece Hart wrote: Do I really need 4 indexes per column to handle the 4 combinations of {equality,like/regexp} x {unfolded,casefolded} in a UTF-8 encoded database? I have a column that I'd like to be able to search with equality and regexp (or like), optionally cas

[GENERAL] text column indexing in UTF-8 database

2009-03-12 Thread Reece Hart
Do I really need 4 indexes per column to handle the 4 combinations of {equality,like/regexp} x {unfolded,casefolded} in a UTF-8 encoded database? I have a column that I'd like to be able to search with equality and regexp (or like), optionally casefolded. The database is UTF-8 encoded. The table a