Re: [GENERAL] Unicode normalization

2009-09-17 Thread Sam Mason
On Thu, Sep 17, 2009 at 12:01:57AM -0400, Alvaro Herrera wrote: http://wiki.postgresql.org/wiki/Strip_accents_from_strings I'm still confused as to why plpython doesn't know the server's encoding already; seems as though all text operations are predicated on knowing this and hence all but the

Re: [GENERAL] Unicode normalization

2009-09-17 Thread Andreas Kalsch
My standard encoding is UTF-8 on all levels so I don't need this high-cost call: plpy.execute(select setting from pg_settings where name = 'server_encoding'); Additionally I want to get the original cases. For this purpose my solution is still fitting to my need. But it is not the one you

[GENERAL] Unicode normalization

2009-09-16 Thread Andreas Kalsch
Has somebody integrated Unicode normalization into Postgres? if not, I would have to implement my own function by using this CPAN module: http://search.cpan.org/~sadahiro/Unicode-Normalize-1.03/ . I need a function which removes all diacritics (1) and transforms some characters to a more

Re: [GENERAL] Unicode normalization

2009-09-16 Thread David Fetter
On Wed, Sep 16, 2009 at 07:20:21PM +0200, Andreas Kalsch wrote: Has somebody integrated Unicode normalization into Postgres? if not, I would have to implement my own function by using this CPAN module: http://search.cpan.org/~sadahiro/Unicode-Normalize-1.03/ . I need a function which

Re: [GENERAL] Unicode normalization

2009-09-16 Thread Andreas Kalsch
No, I need a solution which is as generic as possible. I use UTF-8 encoded unicode strings on all levels. This is what I have done so far: 1) Writing a separate Python command line script for testing - works as expected: #!/usr/bin/python import sys import unicodedata str =

Re: [GENERAL] Unicode normalization

2009-09-16 Thread Andreas Kalsch
Update: The error is of course: The function tries to return str instead of unicode. It is not str.decode('UTF-8') which causes the error. Andreas Kalsch schrieb: No, I need a solution which is as generic as possible. I use UTF-8 encoded unicode strings on all levels. This is what I have

Re: [GENERAL] Unicode normalization

2009-09-16 Thread Sam Mason
On Wed, Sep 16, 2009 at 09:35:02PM +0200, Andreas Kalsch wrote: CREATE OR REPLACE FUNCTION test (str text) RETURNS text AS $$ import unicodedata return unicodedata.normalize('NFKD', str.decode('UTF-8')) $$ LANGUAGE plpythonu; I'd guess you want that to be: return

Re: [GENERAL] Unicode normalization

2009-09-16 Thread Scott Marlowe
On Wed, Sep 16, 2009 at 4:42 PM, Sam Mason s...@samason.me.uk wrote: On Wed, Sep 16, 2009 at 09:35:02PM +0200, Andreas Kalsch wrote: CREATE OR REPLACE FUNCTION test (str text)  RETURNS text AS $$    import unicodedata    return unicodedata.normalize('NFKD', str.decode('UTF-8')) $$ LANGUAGE

Re: [GENERAL] Unicode normalization

2009-09-16 Thread Alvaro Herrera
Andreas Kalsch wrote: 2) Transfering this to PL/Python: CREATE OR REPLACE FUNCTION test (str text) RETURNS text AS $$ import unicodedata return unicodedata.normalize('NFKD', str.decode('UTF-8')) $$ LANGUAGE plpythonu; This is wrong, which is why we published a correct version