Re: [HACKERS] unaccent extension missing some accents

2011-11-10 Thread Tom Lane
Bruce Momjian writes: > Tom Lane wrote: >> However, the bigger picture is that OS X's UTF8 locales are broken >> through-and-through, and most of their other problems are not feasible >> to work around. > If Apple's low-level code came from FreeBSD and NetBSD, how did they get > so broken? AFAIK

Re: [HACKERS] unaccent extension missing some accents

2011-11-10 Thread Bruce Momjian
Tom Lane wrote: > J Smith writes: > > I've attached a patch against master for unaccent.c that uses swscanf > > along with char2wchar and wchar2char instead of sscanf directly to > > initialize the unaccent extension and it appears to fix the problem in > > both the master and 9.1 branches. > > s

Re: [HACKERS] unaccent extension missing some accents

2011-11-07 Thread J Smith
On Mon, Nov 7, 2011 at 11:53 AM, Florian Pflug wrote: > > Various issues with OSX and UTF-8 locales seems to come up quite often, yet > we're not really in a position to do anything about them. > > Thus, I think we should warn about these issues and save people the trouble > of finding out about t

Re: [HACKERS] unaccent extension missing some accents

2011-11-07 Thread Tom Lane
J Smith writes: > Would it even really be worth it to look into any of the other locale > issues on OSX, given that PostgreSQL is now included in their default > installs starting with 10.7, or would this really be more of a case of > hoping Apple some day fixes the issue upstream? To my mind, th

Re: [HACKERS] unaccent extension missing some accents

2011-11-07 Thread J Smith
On Mon, Nov 7, 2011 at 11:59 AM, Tom Lane wrote: > > If you have time to check that the patch I just committed fixes your > problem, it'd be worth doing.  I did not test it on OS X ... Looks good to me, thanks. Would it even really be worth it to look into any of the other locale issues on OSX,

Re: [HACKERS] unaccent extension missing some accents

2011-11-07 Thread Tom Lane
J Smith writes: > Anyways, lemme know if there's anything else I could help with or > could test and whatnot. Cheers. If you have time to check that the patch I just committed fixes your problem, it'd be worth doing. I did not test it on OS X ... regards, tom lane -- S

Re: [HACKERS] unaccent extension missing some accents

2011-11-07 Thread J Smith
On Mon, Nov 7, 2011 at 11:12 AM, Tom Lane wrote: > I looked at this a bit and realized that sscanf is actually doing a > couple of critical things for us, which are lost in translation when > doing it like this: > > 1. It ignores whitespace other than the dividing tab.  If we don't > continue to d

Re: [HACKERS] unaccent extension missing some accents

2011-11-07 Thread Florian Pflug
On Nov7, 2011, at 17:46 , J Smith wrote: > On Mon, Nov 7, 2011 at 11:12 AM, Tom Lane wrote: >> If OS X's UTF8 locales weren't so thoroughly broken (eg sorting does not >> work), I might be tempted to try to do it that way, but I still fail >> to see the point. After reviewing the code I feel that

Re: [HACKERS] unaccent extension missing some accents

2011-11-07 Thread Tom Lane
J Smith writes: > Alright, I wrote up another patch that uses strchr to parse out the > lines of the unaccent.rules file, foregoing sscanf completely. > Hopefully this looks a bit better than using swscanf. I looked at this a bit and realized that sscanf is actually doing a couple of critical thi

Re: [HACKERS] unaccent extension missing some accents

2011-11-06 Thread J Smith
Alright, I wrote up another patch that uses strchr to parse out the lines of the unaccent.rules file, foregoing sscanf completely. Hopefully this looks a bit better than using swscanf. As for the other problems with isspace and such on OSX, it might be worth looking at the python portability fixes

Re: [HACKERS] unaccent extension missing some accents

2011-11-06 Thread J Smith
On 2011-11-06, at 7:15 PM, Tom Lane wrote: > > swscanf doesn't seem like an acceptable approach: it's a function that > is relied on nowhere else in PG, so it adds new portability risks of its > own. It doesn't exist on some platforms that we support (like the one > I'm typing this message on) an

Re: [HACKERS] unaccent extension missing some accents

2011-11-06 Thread Tom Lane
J Smith writes: > I've attached a patch against master for unaccent.c that uses swscanf > along with char2wchar and wchar2char instead of sscanf directly to > initialize the unaccent extension and it appears to fix the problem in > both the master and 9.1 branches. swscanf doesn't seem like an ac

Re: [HACKERS] unaccent extension missing some accents

2011-11-06 Thread J Smith
On Sun, Nov 6, 2011 at 1:18 PM, Florian Pflug wrote: > > What's the locale of the database you're seeing this in, and which charset > does it use? > > I think scanf() uses isspace() and friends, and last time I looked the > locale definitions where all pretty bogus on OSX. So maybe scanf() somehow

Re: [HACKERS] unaccent extension missing some accents

2011-11-06 Thread Florian Pflug
On Nov6, 2011, at 18:43 , J Smith wrote: > I put some elog debugging lines into unaccent.c and found that sscanf > sometimes reads the scanned line by finding only one byte for the for > the source character rather than the two required for the complete > UTF-8 code point. It appears that the follo

Re: [HACKERS] unaccent extension missing some accents

2011-11-06 Thread J Smith
Gah! Accidentally hit Send. Let me finish that last message before sending this time! G'day list. I've been messing around with the unaccent extension and I've noticed that some of the characters listed in the unaccent.rules file aren't actually being unaccented on my system. Here are the syste

[HACKERS] unaccent extension missing some accents

2011-11-06 Thread J Smith
G'day list. I've been messing around with the unaccent extension and I've noticed that some of the characters listed in the unaccent.rules file aren't actually being unaccented on my system. Here are the system details and whatnot. - OSX 10.7.2 - the server is compiled via macports. Tried using