subject:"\[HACKERS\] More message encoding woes"

Re: [HACKERS] More message encoding woes

2009-04-08 Thread Heikki Linnakangas

Peter Eisentraut wrote: On Tuesday 07 April 2009 13:09:42 Heikki Linnakangas wrote: Patch attached. Instead of checking for LC_CTYPE == C, I'm checking "pg_get_encoding_from_locale(NULL) == encoding" which is more close to what we actually want. The downside is that pg_get_encoding_from_locale(N

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Hiroshi Inoue

Tom Lane wrote: > Heikki Linnakangas writes: >> Hiroshi Inoue wrote: >>> What is wrong with checking if the codeset is valid using iconv_open()? > >> That would probably work as well. We'd have to decide what we'd try to >> convert from with iconv_open(). > > The problem I have with that is tha

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Peter Eisentraut

On Tuesday 07 April 2009 13:09:42 Heikki Linnakangas wrote: > Patch attached. Instead of checking for LC_CTYPE == C, I'm checking > "pg_get_encoding_from_locale(NULL) == encoding" which is more close to > what we actually want. The downside is that > pg_get_encoding_from_locale(NULL) isn't exactly

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Heikki Linnakangas

Peter Eisentraut wrote: On Tuesday 07 April 2009 13:09:42 Heikki Linnakangas wrote: Patch attached. Instead of checking for LC_CTYPE == C, I'm checking "pg_get_encoding_from_locale(NULL) == encoding" which is more close to what we actually want. The downside is that pg_get_encoding_from_locale(N

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Tom Lane

Heikki Linnakangas writes: > Hiroshi Inoue wrote: >> What is wrong with checking if the codeset is valid using iconv_open()? > That would probably work as well. We'd have to decide what we'd try to > convert from with iconv_open(). The problem I have with that is that you are now guessing at *t

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Heikki Linnakangas

Hiroshi Inoue wrote: What is wrong with checking if the codeset is valid using iconv_open()? That would probably work as well. We'd have to decide what we'd try to convert from with iconv_open(). Utf-8 might be a safe choice. We don't currently use iconv_open() anywhere in the backend, though

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Hiroshi Inoue

Heikki Linnakangas wrote: Hiroshi Inoue wrote: Heikki Linnakangas wrote: I just tried that, and it seems that gettext() does transliteration, so any characters that have no counterpart in the database encoding will be replaced with something similar, or question marks. Assuming that's univers

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Heikki Linnakangas

Peter Eisentraut wrote: On Tuesday 07 April 2009 11:21:25 Heikki Linnakangas wrote: Using the name for the latin1 encoding in the currently Windows-only mapping table, "LATIN1", you get no translation because that name is not recognized by the system. Using the other name "ISO-8859-1", it works.

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Heikki Linnakangas

Hiroshi Inoue wrote: Heikki Linnakangas wrote: I just tried that, and it seems that gettext() does transliteration, so any characters that have no counterpart in the database encoding will be replaced with something similar, or question marks. Assuming that's universal across platforms, and I

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Peter Eisentraut

On Tuesday 07 April 2009 11:21:25 Heikki Linnakangas wrote: > Peter Eisentraut wrote: > > In practice you get either the GNU or the Solaris version of gettext, and > > at least the GNU version can cope with all the encoding names that the > > currently Windows-only code path produces. > > It doesn'

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Heikki Linnakangas

Peter Eisentraut wrote: In practice you get either the GNU or the Solaris version of gettext, and at least the GNU version can cope with all the encoding names that the currently Windows-only code path produces. It doesn't. On my laptop running Debian testing: hlinn...@heikkilaptop:~$ LC_ALL

Re: [HACKERS] More message encoding woes

2009-04-06 Thread Peter Eisentraut

On Monday 30 March 2009 15:52:37 Heikki Linnakangas wrote: > In CVS HEAD, we call bind_textdomain_codeset() in SetDatabaseEncoding() > which fixes that, but we only do it on Windows. In earlier versions we > called it on all platforms, but only for UTF-8. It seems that we should > call bind_textdom

Re: [HACKERS] More message encoding woes

2009-04-02 Thread Hiroshi Inoue

Hiroshi Inoue wrote: Heikki Linnakangas wrote: Tom Lane wrote: Heikki Linnakangas writes: Tom Lane wrote: Maybe use a special string "Translate Me First" that doesn't actually need to be end-user-visible, just so no one sweats over getting it right in context. Yep, something like that. T

Re: [HACKERS] More message encoding woes

2009-04-02 Thread Peter Eisentraut

On Monday 30 March 2009 15:52:37 Heikki Linnakangas wrote: > What is happening is that gettext() returns the message in the encoding > determined by LC_CTYPE, while we expect it to return it in the database > encoding. Starting with PG 8.3 we enforce that the encoding specified in > LC_CTYPE matche

Re: [HACKERS] More message encoding woes

2009-04-02 Thread Hiroshi Inoue

Heikki Linnakangas wrote: Tom Lane wrote: Heikki Linnakangas writes: Tom Lane wrote: Maybe use a special string "Translate Me First" that doesn't actually need to be end-user-visible, just so no one sweats over getting it right in context. Yep, something like that. There seems to be a mag

Re: [HACKERS] More message encoding woes

2009-04-01 Thread Hiroshi Inoue

Tom Lane wrote: Hiroshi Inoue writes: Heikki Linnakangas wrote: I just tried that, and it seems that gettext() does transliteration, so any characters that have no counterpart in the database encoding will be replaced with something similar, or question marks. It doesn't occur in the curre

Re: [HACKERS] More message encoding woes

2009-04-01 Thread Tom Lane

Hiroshi Inoue writes: > Heikki Linnakangas wrote: >> I just tried that, and it seems that gettext() does transliteration, so >> any characters that have no counterpart in the database encoding will be >> replaced with something similar, or question marks. > It doesn't occur in the current Windo

Re: [HACKERS] More message encoding woes

2009-04-01 Thread Hiroshi Inoue

Heikki Linnakangas wrote: Tom Lane wrote: Heikki Linnakangas writes: Tom Lane wrote: Maybe use a special string "Translate Me First" that doesn't actually need to be end-user-visible, just so no one sweats over getting it right in context. Yep, something like that. There seems to be a mag

Re: [HACKERS] More message encoding woes

2009-04-01 Thread Alvaro Herrera

Tom Lane wrote: > Alvaro Herrera writes: > > One problem with this idea is that it may be hard to coerce gettext into > > putting a particular string at the top of the file :-( > > I doubt we can, which is why the documentation needs to tell translators > about it. I doubt that documenting the

Re: [HACKERS] More message encoding woes

2009-04-01 Thread Heikki Linnakangas

Tom Lane wrote: Heikki Linnakangas writes: Tom Lane wrote: Maybe use a special string "Translate Me First" that doesn't actually need to be end-user-visible, just so no one sweats over getting it right in context. Yep, something like that. There seems to be a magic empty string translation

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Tom Lane

Peter Eisentraut writes: > On Monday 30 March 2009 20:06:48 Heikki Linnakangas wrote: >> LC_CTYPE. In 8.3 and up where we constrain that to match the database >> encoding, we only have a problem with the C locale. > Why don't we apply the same restriction to the C locale then? (1) what would you

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Peter Eisentraut

On Monday 30 March 2009 20:06:48 Heikki Linnakangas wrote: > Tom Lane wrote: > > Where does it get the default codeset from? Maybe we could constrain > > that to match the database encoding, the way we do for LC_COLLATE/CTYPE? > > LC_CTYPE. In 8.3 and up where we constrain that to match the databa

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Peter Eisentraut

On Monday 30 March 2009 21:04:00 Tom Lane wrote: > Heikki Linnakangas writes: > > Tom Lane wrote: > >> Could we get away with just unconditionally calling > >> bind_textdomain_codeset with *our* canonical spelling of the encoding > >> name? If it works, great, and if it doesn't, you get English.

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Tom Lane

Alvaro Herrera writes: > Tom Lane wrote: >> At first that sounded like an ideal answer, but I can see a gotcha: >> suppose the translation's author's name contains some characters that >> don't convert to the database encoding. I suppose that would result in >> failure, when we'd prefer it not to

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Alvaro Herrera

Tom Lane wrote: > At first that sounded like an ideal answer, but I can see a gotcha: > suppose the translation's author's name contains some characters that > don't convert to the database encoding. I suppose that would result in > failure, when we'd prefer it not to. A single-purpose string co

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Tom Lane

Heikki Linnakangas writes: > Tom Lane wrote: >> Maybe use a special string "Translate Me First" that >> doesn't actually need to be end-user-visible, just so no one sweats over >> getting it right in context. > Yep, something like that. There seems to be a magic empty string > translation at the

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Heikki Linnakangas

Tom Lane wrote: Heikki Linnakangas writes: I'm leaning towards the idea of trying out all the spellings of the database encoding we have in encoding_match_list. That gives the best user experience, as it just works, and it doesn't seem that complicated. How were you going to check --- use th

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Tom Lane

Heikki Linnakangas writes: > I'm leaning towards the idea of trying out all the spellings of the > database encoding we have in encoding_match_list. That gives the best > user experience, as it just works, and it doesn't seem that complicated. How were you going to check --- use that idea of tr

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Heikki Linnakangas

Heikki Linnakangas wrote: One idea is to extract the encoding from LC_MESSAGES. Then call pg_get_encoding_from_locale() on that and check that it matches server_encoding. If it does, great, pass it to bind_textdomain_codeset(). If it doesn't, throw an error. I tried to implement this but it g

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Zdenek Kotala

Tom Lane píše v po 30. 03. 2009 v 14:04 -0400: > Heikki Linnakangas writes: > > Tom Lane wrote: > >> Could we get away with just unconditionally calling > >> bind_textdomain_codeset with *our* canonical spelling of the encoding > >> name? If it works, great, and if it doesn't, you get English. >

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Heikki Linnakangas

Tom Lane wrote: What we need is an API equivalent to "iconv --list", but I'm not seeing one :-(. There's also "locale -m". Looking at the implementation of that, it just lists what's in /usr/share/i18n/charmaps. Not too portable either.. Do we need to go so far as to try to run that progra

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Tom Lane

Heikki Linnakangas writes: > Tom Lane wrote: >> Could we get away with just unconditionally calling >> bind_textdomain_codeset with *our* canonical spelling of the encoding >> name? If it works, great, and if it doesn't, you get English. > Yeah, that's better than nothing. A quick look at the o

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Heikki Linnakangas

Tom Lane wrote: Heikki Linnakangas writes: Tom Lane wrote: Where does it get the default codeset from? Maybe we could constrain that to match the database encoding, the way we do for LC_COLLATE/CTYPE? LC_CTYPE. In 8.3 and up where we constrain that to match the database encoding, we only h

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Tom Lane

Heikki Linnakangas writes: > Tom Lane wrote: >> Where does it get the default codeset from? Maybe we could constrain >> that to match the database encoding, the way we do for LC_COLLATE/CTYPE? > LC_CTYPE. In 8.3 and up where we constrain that to match the database > encoding, we only have a pro

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Heikki Linnakangas

Tom Lane wrote: Where does it get the default codeset from? Maybe we could constrain that to match the database encoding, the way we do for LC_COLLATE/CTYPE? LC_CTYPE. In 8.3 and up where we constrain that to match the database encoding, we only have a problem with the C locale. -- Heikki

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Tom Lane

Heikki Linnakangas writes: > Tom Lane wrote: >> Another idea is to try the values listed in our encoding_match_list[] >> until bind_textdomain_codeset succeeds. The problem here is that the >> GNU documentation is *exceedingly* vague about whether >> bind_textdomain_codeset behaves sanely (ie thr

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Heikki Linnakangas

Tom Lane wrote: Another idea is to try the values listed in our encoding_match_list[] until bind_textdomain_codeset succeeds. The problem here is that the GNU documentation is *exceedingly* vague about whether bind_textdomain_codeset behaves sanely (ie throws a recognizable error) when given a b

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Tom Lane

Heikki Linnakangas writes: > In CVS HEAD, we call bind_textdomain_codeset() in SetDatabaseEncoding() > which fixes that, but we only do it on Windows. In earlier versions we > called it on all platforms, but only for UTF-8. It seems that we should > call bind_textdomain_codeset on all platforms

[HACKERS] More message encoding woes

2009-03-30 Thread Heikki Linnakangas

latin1db=# SELECT version(); version --- PostgreSQL 8.3.7 on i686-pc-linux-gnu, compiled by GCC gcc (Debian 4.3.3-5) 4.3.3 (1 row) latin1db=# SELECT name, setting FROM pg_se

39 matches

Mail list logo