Re: [HACKERS] More message encoding woes

2009-04-08 Thread Heikki Linnakangas
Peter Eisentraut wrote: On Tuesday 07 April 2009 13:09:42 Heikki Linnakangas wrote: Patch attached. Instead of checking for LC_CTYPE == C, I'm checking "pg_get_encoding_from_locale(NULL) == encoding" which is more close to what we actually want. The downside is that pg_get_encoding_from_locale(N

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Hiroshi Inoue
Tom Lane wrote: > Heikki Linnakangas writes: >> Hiroshi Inoue wrote: >>> What is wrong with checking if the codeset is valid using iconv_open()? > >> That would probably work as well. We'd have to decide what we'd try to >> convert from with iconv_open(). > > The problem I have with that is tha

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Peter Eisentraut
On Tuesday 07 April 2009 13:09:42 Heikki Linnakangas wrote: > Patch attached. Instead of checking for LC_CTYPE == C, I'm checking > "pg_get_encoding_from_locale(NULL) == encoding" which is more close to > what we actually want. The downside is that > pg_get_encoding_from_locale(NULL) isn't exactly

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Heikki Linnakangas
Peter Eisentraut wrote: On Tuesday 07 April 2009 13:09:42 Heikki Linnakangas wrote: Patch attached. Instead of checking for LC_CTYPE == C, I'm checking "pg_get_encoding_from_locale(NULL) == encoding" which is more close to what we actually want. The downside is that pg_get_encoding_from_locale(N

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Tom Lane
Heikki Linnakangas writes: > Hiroshi Inoue wrote: >> What is wrong with checking if the codeset is valid using iconv_open()? > That would probably work as well. We'd have to decide what we'd try to > convert from with iconv_open(). The problem I have with that is that you are now guessing at *t

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Heikki Linnakangas
Hiroshi Inoue wrote: What is wrong with checking if the codeset is valid using iconv_open()? That would probably work as well. We'd have to decide what we'd try to convert from with iconv_open(). Utf-8 might be a safe choice. We don't currently use iconv_open() anywhere in the backend, though

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Hiroshi Inoue
Heikki Linnakangas wrote: Hiroshi Inoue wrote: Heikki Linnakangas wrote: I just tried that, and it seems that gettext() does transliteration, so any characters that have no counterpart in the database encoding will be replaced with something similar, or question marks. Assuming that's univers

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Heikki Linnakangas
Peter Eisentraut wrote: On Tuesday 07 April 2009 11:21:25 Heikki Linnakangas wrote: Using the name for the latin1 encoding in the currently Windows-only mapping table, "LATIN1", you get no translation because that name is not recognized by the system. Using the other name "ISO-8859-1", it works.

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Heikki Linnakangas
Hiroshi Inoue wrote: Heikki Linnakangas wrote: I just tried that, and it seems that gettext() does transliteration, so any characters that have no counterpart in the database encoding will be replaced with something similar, or question marks. Assuming that's universal across platforms, and I

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Peter Eisentraut
On Tuesday 07 April 2009 11:21:25 Heikki Linnakangas wrote: > Peter Eisentraut wrote: > > In practice you get either the GNU or the Solaris version of gettext, and > > at least the GNU version can cope with all the encoding names that the > > currently Windows-only code path produces. > > It doesn'

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Heikki Linnakangas
Peter Eisentraut wrote: In practice you get either the GNU or the Solaris version of gettext, and at least the GNU version can cope with all the encoding names that the currently Windows-only code path produces. It doesn't. On my laptop running Debian testing: hlinn...@heikkilaptop:~$ LC_ALL

Re: [HACKERS] More message encoding woes

2009-04-06 Thread Peter Eisentraut
On Monday 30 March 2009 15:52:37 Heikki Linnakangas wrote: > In CVS HEAD, we call bind_textdomain_codeset() in SetDatabaseEncoding() > which fixes that, but we only do it on Windows. In earlier versions we > called it on all platforms, but only for UTF-8. It seems that we should > call bind_textdom

Re: [HACKERS] More message encoding woes

2009-04-02 Thread Hiroshi Inoue
Hiroshi Inoue wrote: Heikki Linnakangas wrote: Tom Lane wrote: Heikki Linnakangas writes: Tom Lane wrote: Maybe use a special string "Translate Me First" that doesn't actually need to be end-user-visible, just so no one sweats over getting it right in context. Yep, something like that. T

Re: [HACKERS] More message encoding woes

2009-04-02 Thread Peter Eisentraut
On Monday 30 March 2009 15:52:37 Heikki Linnakangas wrote: > What is happening is that gettext() returns the message in the encoding > determined by LC_CTYPE, while we expect it to return it in the database > encoding. Starting with PG 8.3 we enforce that the encoding specified in > LC_CTYPE matche

Re: [HACKERS] More message encoding woes

2009-04-02 Thread Hiroshi Inoue
Heikki Linnakangas wrote: Tom Lane wrote: Heikki Linnakangas writes: Tom Lane wrote: Maybe use a special string "Translate Me First" that doesn't actually need to be end-user-visible, just so no one sweats over getting it right in context. Yep, something like that. There seems to be a mag

Re: [HACKERS] More message encoding woes

2009-04-01 Thread Hiroshi Inoue
Tom Lane wrote: Hiroshi Inoue writes: Heikki Linnakangas wrote: I just tried that, and it seems that gettext() does transliteration, so any characters that have no counterpart in the database encoding will be replaced with something similar, or question marks. It doesn't occur in the curre

Re: [HACKERS] More message encoding woes

2009-04-01 Thread Tom Lane
Hiroshi Inoue writes: > Heikki Linnakangas wrote: >> I just tried that, and it seems that gettext() does transliteration, so >> any characters that have no counterpart in the database encoding will be >> replaced with something similar, or question marks. > It doesn't occur in the current Windo

Re: [HACKERS] More message encoding woes

2009-04-01 Thread Hiroshi Inoue
Heikki Linnakangas wrote: Tom Lane wrote: Heikki Linnakangas writes: Tom Lane wrote: Maybe use a special string "Translate Me First" that doesn't actually need to be end-user-visible, just so no one sweats over getting it right in context. Yep, something like that. There seems to be a mag

Re: [HACKERS] More message encoding woes

2009-04-01 Thread Alvaro Herrera
Tom Lane wrote: > Alvaro Herrera writes: > > One problem with this idea is that it may be hard to coerce gettext into > > putting a particular string at the top of the file :-( > > I doubt we can, which is why the documentation needs to tell translators > about it. I doubt that documenting the

Re: [HACKERS] More message encoding woes

2009-04-01 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas writes: Tom Lane wrote: Maybe use a special string "Translate Me First" that doesn't actually need to be end-user-visible, just so no one sweats over getting it right in context. Yep, something like that. There seems to be a magic empty string translation

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Tom Lane
Peter Eisentraut writes: > On Monday 30 March 2009 20:06:48 Heikki Linnakangas wrote: >> LC_CTYPE. In 8.3 and up where we constrain that to match the database >> encoding, we only have a problem with the C locale. > Why don't we apply the same restriction to the C locale then? (1) what would you

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Peter Eisentraut
On Monday 30 March 2009 20:06:48 Heikki Linnakangas wrote: > Tom Lane wrote: > > Where does it get the default codeset from? Maybe we could constrain > > that to match the database encoding, the way we do for LC_COLLATE/CTYPE? > > LC_CTYPE. In 8.3 and up where we constrain that to match the databa

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Peter Eisentraut
On Monday 30 March 2009 21:04:00 Tom Lane wrote: > Heikki Linnakangas writes: > > Tom Lane wrote: > >> Could we get away with just unconditionally calling > >> bind_textdomain_codeset with *our* canonical spelling of the encoding > >> name? If it works, great, and if it doesn't, you get English.

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Tom Lane
Alvaro Herrera writes: > Tom Lane wrote: >> At first that sounded like an ideal answer, but I can see a gotcha: >> suppose the translation's author's name contains some characters that >> don't convert to the database encoding. I suppose that would result in >> failure, when we'd prefer it not to

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Alvaro Herrera
Tom Lane wrote: > At first that sounded like an ideal answer, but I can see a gotcha: > suppose the translation's author's name contains some characters that > don't convert to the database encoding. I suppose that would result in > failure, when we'd prefer it not to. A single-purpose string co

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Tom Lane
Heikki Linnakangas writes: > Tom Lane wrote: >> Maybe use a special string "Translate Me First" that >> doesn't actually need to be end-user-visible, just so no one sweats over >> getting it right in context. > Yep, something like that. There seems to be a magic empty string > translation at the

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas writes: I'm leaning towards the idea of trying out all the spellings of the database encoding we have in encoding_match_list. That gives the best user experience, as it just works, and it doesn't seem that complicated. How were you going to check --- use th

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Tom Lane
Heikki Linnakangas writes: > I'm leaning towards the idea of trying out all the spellings of the > database encoding we have in encoding_match_list. That gives the best > user experience, as it just works, and it doesn't seem that complicated. How were you going to check --- use that idea of tr

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Heikki Linnakangas
Heikki Linnakangas wrote: One idea is to extract the encoding from LC_MESSAGES. Then call pg_get_encoding_from_locale() on that and check that it matches server_encoding. If it does, great, pass it to bind_textdomain_codeset(). If it doesn't, throw an error. I tried to implement this but it g

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Zdenek Kotala
Tom Lane píše v po 30. 03. 2009 v 14:04 -0400: > Heikki Linnakangas writes: > > Tom Lane wrote: > >> Could we get away with just unconditionally calling > >> bind_textdomain_codeset with *our* canonical spelling of the encoding > >> name? If it works, great, and if it doesn't, you get English. >

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Heikki Linnakangas
Tom Lane wrote: What we need is an API equivalent to "iconv --list", but I'm not seeing one :-(. There's also "locale -m". Looking at the implementation of that, it just lists what's in /usr/share/i18n/charmaps. Not too portable either.. Do we need to go so far as to try to run that progra

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Tom Lane
Heikki Linnakangas writes: > Tom Lane wrote: >> Could we get away with just unconditionally calling >> bind_textdomain_codeset with *our* canonical spelling of the encoding >> name? If it works, great, and if it doesn't, you get English. > Yeah, that's better than nothing. A quick look at the o

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas writes: Tom Lane wrote: Where does it get the default codeset from? Maybe we could constrain that to match the database encoding, the way we do for LC_COLLATE/CTYPE? LC_CTYPE. In 8.3 and up where we constrain that to match the database encoding, we only h

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Tom Lane
Heikki Linnakangas writes: > Tom Lane wrote: >> Where does it get the default codeset from? Maybe we could constrain >> that to match the database encoding, the way we do for LC_COLLATE/CTYPE? > LC_CTYPE. In 8.3 and up where we constrain that to match the database > encoding, we only have a pro

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Heikki Linnakangas
Tom Lane wrote: Where does it get the default codeset from? Maybe we could constrain that to match the database encoding, the way we do for LC_COLLATE/CTYPE? LC_CTYPE. In 8.3 and up where we constrain that to match the database encoding, we only have a problem with the C locale. -- Heikki

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Tom Lane
Heikki Linnakangas writes: > Tom Lane wrote: >> Another idea is to try the values listed in our encoding_match_list[] >> until bind_textdomain_codeset succeeds. The problem here is that the >> GNU documentation is *exceedingly* vague about whether >> bind_textdomain_codeset behaves sanely (ie thr

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Heikki Linnakangas
Tom Lane wrote: Another idea is to try the values listed in our encoding_match_list[] until bind_textdomain_codeset succeeds. The problem here is that the GNU documentation is *exceedingly* vague about whether bind_textdomain_codeset behaves sanely (ie throws a recognizable error) when given a b

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Tom Lane
Heikki Linnakangas writes: > In CVS HEAD, we call bind_textdomain_codeset() in SetDatabaseEncoding() > which fixes that, but we only do it on Windows. In earlier versions we > called it on all platforms, but only for UTF-8. It seems that we should > call bind_textdomain_codeset on all platforms

[HACKERS] More message encoding woes

2009-03-30 Thread Heikki Linnakangas
latin1db=# SELECT version(); version --- PostgreSQL 8.3.7 on i686-pc-linux-gnu, compiled by GCC gcc (Debian 4.3.3-5) 4.3.3 (1 row) latin1db=# SELECT name, setting FROM pg_se