Re: [BUGS] BUG #6457: Regexp not processing word (with special characters on ends) correctly (UTF-8)

2012-02-15 Thread Duncan Rance
On 14 Feb 2012, at 18:28, Tom Lane wrote: Oh, I see the reason for this: the code in cclass() in regc_locale.c doesn't go further up than U+00FF, so no codes above that will be thought to be letters (or members of any other character class). Clearly we need to go further when we are dealing

Re: [BUGS] BUG #6457: Regexp not processing word (with special characters on ends) correctly (UTF-8)

2012-02-15 Thread Duncan Rance
On 14 Feb 2012, at 18:28, Tom Lane wrote: Oh, I see the reason for this: the code in cclass() in regc_locale.c doesn't go further up than U+00FF, so no codes above that will be thought to be letters (or members of any other character class). Clearly we need to go further when we are dealing

[BUGS] BUG #6457: Regexp not processing word (with special characters on ends) correctly (UTF-8)

2012-02-14 Thread albert . cieszkowski
The following bug has been logged on the website: Bug reference: 6457 Logged by: Albert Cieszkowski Email address: albert.cieszkow...@cc.com.pl PostgreSQL version: 9.0.6 Operating system: CentOS 5.x Description: OS, base and client encoding UTF-8: peimp= select

Re: [BUGS] BUG #6457: Regexp not processing word (with special characters on ends) correctly (UTF-8)

2012-02-14 Thread Tom Lane
albert.cieszkow...@cc.com.pl writes: OS, base and client encoding UTF-8: What's your lc_collate/lc_ctype settings? regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription:

Re: [BUGS] BUG #6457: Regexp not processing word (with special characters on ends) correctly (UTF-8)

2012-02-14 Thread Albert Cieszkowski
Hello Tom, Every lc_x value is pl_PL.UTF8 (corresponding to the word's language). Database was created with --locale=pl_PL.UTF8. OS (CentOS 5.x) uses: en_US.UTF-8 Best regards, Albert Cieszkowski W dniu 2012-02-14 16:27, Tom Lane pisze:

Re: [BUGS] BUG #6457: Regexp not processing word (with special characters on ends) correctly (UTF-8)

2012-02-14 Thread Tom Lane
albert.cieszkow...@cc.com.pl writes: peimp= select 'Świnoujście' ~* '\mŚwinoujście\M'; ?column? -- f (1 row) Oh, I see the reason for this: the code in cclass() in regc_locale.c doesn't go further up than U+00FF, so no codes above that will be thought to be letters (or members