Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-26 Thread Zeugswetter Andreas SB SD
There are no such libraries. I keep hearing ICU, but that is much too bloated. At least it is kind of standard and also something what will be maintained for foreseeable future, it also has a compatible license and is available on all platforms of interest to postgresql. And it is used

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-26 Thread Kurt Roeckx
On Tue, Nov 25, 2003 at 04:19:05PM -0500, Tom Lane wrote: UCS-2 is impractical without some *extremely* wide-ranging changes in the backend. To take just the most obvious point, doesn't it require allowing embedded zero bytes in text strings? If you're going to use unicode in the rest of

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Tatsuo Ishii
OK, I've been spreading rumours about fixing the internationalization problems, so let me make it a bit more clear. Here are the problems that need to be fixed: - Only one locale per process possible. - Only one gettext-language per process possible. - lc_collate and lc_ctype need to

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Dennis Bjorklund
On Mon, 24 Nov 2003, Peter Eisentraut wrote: 1. Take out the character set conversion routines from the backend and make them a library of their own. This could possibly be modelled after iconv, but not necessarily. Or we might conclude that we can just use iconv in the first place. 2.

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Peter Eisentraut
Tatsuo Ishii writes: 3. Implement Unicode collation algorithm and character classification routines that are aware of 1. Use that in place of system locale routines. I don't see a relationship between Unicode and the one you are going to replace the system locale routines. If you are

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Peter Eisentraut
Dennis Bjorklund writes: Force all translations to be in unicode and convert to other client encodings if needed. There is no need to support translations stored using different encodings. Tell that to the Japanese. Couldn't we use some library that already have this, like glib (or

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Zeugswetter Andreas SB SD
Have you looked at what is available from http://oss.software.ibm.com/icu/ ? Seems they have a compatible license, but use some C++. Andreas ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Dennis Bjorklund
On Tue, 25 Nov 2003, Peter Eisentraut wrote: Force all translations to be in unicode and convert to other client encodings if needed. There is no need to support translations stored using different encodings. Tell that to the Japanese. I've always thought unicode was enough to even

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Tatsuo Ishii
On Tue, 25 Nov 2003, Peter Eisentraut wrote: Force all translations to be in unicode and convert to other client encodings if needed. There is no need to support translations stored using different encodings. Tell that to the Japanese. I've always thought unicode was enough to

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Dennis Bjorklund
On Tue, 25 Nov 2003, Tatsuo Ishii wrote: I'm tired of telling that Unicode is not that perfect. Another gottcha with Unicode is the UTF-8 encoding (currently we use) consumes 3 bytes for each Kanji character, while other encodings consume only 2 bytes. IMO 3/2 storage ratio could not be

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Dennis Bjorklund
On Tue, 25 Nov 2003, Tatsuo Ishii wrote: I'm tired of telling that Unicode is not that perfect. Another gottcha with Unicode is the UTF-8 encoding (currently we use) consumes 3 bytes for each Kanji character, while other encodings consume only 2 bytes. IMO 3/2 storage ratio could not be

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes: Dennis Bjorklund writes: Couldn't we use some library that already have this, like glib (or something else). If it's not up to what we need, than fix that library instead. I wasn't aware that glib had this. I'll look. Of course the trouble with

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes: Actually, what will more likely happen is that we'll define a collation as a collection of one or more support functions, the equivalents of strxfrm() and possibly a few more. Then it will be up to those functions to define the collation order. The

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Doug McNaught
Tom Lane [EMAIL PROTECTED] writes: Peter Eisentraut [EMAIL PROTECTED] writes: I wasn't aware that glib had this. I'll look. Of course the trouble with relying on glibc is that we'd have no solution for platforms that don't use glibc. glib != glibc. glib is the low-level library used

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Hannu Krosing
Dennis Bjorklund kirjutas T, 25.11.2003 kell 14:51: On Tue, 25 Nov 2003, Tatsuo Ishii wrote: I'm tired of telling that Unicode is not that perfect. Of course not, but neither is the current multibyte with only marginal support for unicode (many people actually need upper()/lower() )

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Greg Stark
Peter Eisentraut [EMAIL PROTECTED] writes: 2. Reimplement gettext to use 1. and allow switching of language and encoding at run-time. 3. Implement Unicode collation algorithm and character classification routines that are aware of 1. Use that in place of system locale routines. This

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Kurt Roeckx
On Tue, Nov 25, 2003 at 08:40:57PM +0900, Tatsuo Ishii wrote: On Tue, 25 Nov 2003, Peter Eisentraut wrote: I've always thought unicode was enough to even represent Japanese. Then the client encoding can be something else that we can convert to. In any way, the encoding of the message

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Peter Eisentraut
Greg Stark writes: This sounds like you want to completely reimplement all of the locale handling provided by the OS? That seems like a dead-end approach to me. There's no way your handling will ever be as complete or as well optimized as some OS's. Actually, I'm pretty sure it will be more

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Hannu Krosing
Peter Eisentraut kirjutas T, 25.11.2003 kell 21:13: Greg Stark writes: This sounds like you want to completely reimplement all of the locale handling provided by the OS? That seems like a dead-end approach to me. There's no way your handling will ever be as complete or as well optimized

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Tom Lane
Kurt Roeckx [EMAIL PROTECTED] writes: You can encode unicode in different ways, and UTF-8 is only one of them. Is there a problem with using UCS-2 (except that it would require more storage for ASCII)? UCS-2 is impractical without some *extremely* wide-ranging changes in the backend. To take

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Christopher Kings-Lynne
About storing data in the database, I would expect it to work with any encoding, just like I would expect pg to be able to store images in any format. What's stopping us supporting the other Unicode encodings, eg. UCS-16 which could save Japansese storage space. Chris

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes: The only advantage to adding locales per-column and/or per-index is the notational simplicity. Well, actually, the reason we are interested in doing it is the SQL spec demands it. regards, tom lane ---(end of

[HACKERS] A rough roadmap for internationalization fixes

2003-11-24 Thread Peter Eisentraut
OK, I've been spreading rumours about fixing the internationalization problems, so let me make it a bit more clear. Here are the problems that need to be fixed: - Only one locale per process possible. - Only one gettext-language per process possible. - lc_collate and lc_ctype need to be held