Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-26 Thread Kurt Roeckx
On Tue, Nov 25, 2003 at 04:19:05PM -0500, Tom Lane wrote: > > UCS-2 is impractical without some *extremely* wide-ranging changes in > the backend. To take just the most obvious point, doesn't it require > allowing embedded zero bytes in text strings? If you're going to use unicode in the rest of

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-26 Thread Zeugswetter Andreas SB SD
> > There are no such libraries. I keep hearing ICU, but that is much too > > bloated. > > At least it is kind of "standard" and also something what will be > maintained for foreseeable future, it also has a compatible license and > is available on all platforms of interest to postgresql. And i

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes: > The only advantage to adding locales per-column and/or per-index is the > notational simplicity. Well, actually, the reason we are interested in doing it is the SQL spec demands it. regards, tom lane ---(end

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Christopher Kings-Lynne
About storing data in the database, I would expect it to work with any encoding, just like I would expect pg to be able to store images in any format. What's stopping us supporting the other Unicode encodings, eg. UCS-16 which could save Japansese storage space. Chris -

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Tom Lane
Kurt Roeckx <[EMAIL PROTECTED]> writes: > You can encode unicode in different ways, and UTF-8 is only one > of them. Is there a problem with using UCS-2 (except that it > would require more storage for ASCII)? UCS-2 is impractical without some *extremely* wide-ranging changes in the backend. To

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Hannu Krosing
Peter Eisentraut kirjutas T, 25.11.2003 kell 21:13: > Greg Stark writes: > > > This sounds like you want to completely reimplement all of the locale handling > > provided by the OS? That seems like a dead-end approach to me. There's no way > > your handling will ever be as complete or as well opti

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Peter Eisentraut
Greg Stark writes: > This sounds like you want to completely reimplement all of the locale handling > provided by the OS? That seems like a dead-end approach to me. There's no way > your handling will ever be as complete or as well optimized as some OS's. Actually, I'm pretty sure it will be more

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Kurt Roeckx
On Tue, Nov 25, 2003 at 08:40:57PM +0900, Tatsuo Ishii wrote: > > On Tue, 25 Nov 2003, Peter Eisentraut wrote: > > > > I've always thought unicode was enough to even represent Japanese. Then > > the client encoding can be something else that we can convert to. In any > > way, the encoding of the

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Greg Stark
Peter Eisentraut <[EMAIL PROTECTED]> writes: > 2. Reimplement gettext to use 1. and allow switching of language and > encoding at run-time. > > 3. Implement Unicode collation algorithm and character classification > routines that are aware of 1. Use that in place of system locale > routines. T

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Hannu Krosing
Dennis Bjorklund kirjutas T, 25.11.2003 kell 14:51: > On Tue, 25 Nov 2003, Tatsuo Ishii wrote: > > > I'm tired of telling that Unicode is not that perfect. Of course not, but neither is the current multibyte with only marginal support for unicode (many people actually need upper()/lower() ) > A

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Doug McNaught
Tom Lane <[EMAIL PROTECTED]> writes: > Peter Eisentraut <[EMAIL PROTECTED]> writes: > > > I wasn't aware that glib had this. I'll look. > > Of course the trouble with relying on glibc is that we'd have no solution > for platforms that don't use glibc. glib != glibc. glib is the low-level libr

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Tom Lane
Peter Eisentraut <[EMAIL PROTECTED]> writes: > Actually, what will more likely happen is that we'll define a collation as > a collection of one or more support functions, the equivalents of > strxfrm() and possibly a few more. Then it will be up to those functions > to define the collation order.

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Tom Lane
Peter Eisentraut <[EMAIL PROTECTED]> writes: > Dennis Bjorklund writes: >> Couldn't we use some library that already have this, like glib (or >> something else). If it's not up to what we need, than fix that library >> instead. > I wasn't aware that glib had this. I'll look. Of course the troubl

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Dennis Bjorklund
On Tue, 25 Nov 2003, Tatsuo Ishii wrote: > I'm tired of telling that Unicode is not that perfect. Another gottcha > with Unicode is the UTF-8 encoding (currently we use) consumes 3 > bytes for each Kanji character, while other encodings consume only 2 > bytes. IMO 3/2 storage ratio could not be ne

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Dennis Bjorklund
On Tue, 25 Nov 2003, Tatsuo Ishii wrote: > I'm tired of telling that Unicode is not that perfect. Another gottcha > with Unicode is the UTF-8 encoding (currently we use) consumes 3 > bytes for each Kanji character, while other encodings consume only 2 > bytes. IMO 3/2 storage ratio could not be ne

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Tatsuo Ishii
> On Tue, 25 Nov 2003, Peter Eisentraut wrote: > > > > Force all translations to be in unicode and convert to other client > > > encodings if needed. There is no need to support translations stored using > > > different encodings. > > > > Tell that to the Japanese. > > I've always thought unicod

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Dennis Bjorklund
On Tue, 25 Nov 2003, Peter Eisentraut wrote: > > Force all translations to be in unicode and convert to other client > > encodings if needed. There is no need to support translations stored using > > different encodings. > > Tell that to the Japanese. I've always thought unicode was enough to ev

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Zeugswetter Andreas SB SD
Have you looked at what is available from http://oss.software.ibm.com/icu/ ? Seems they have a compatible license, but use some C++. Andreas ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscr

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Peter Eisentraut
Dennis Bjorklund writes: > Force all translations to be in unicode and convert to other client > encodings if needed. There is no need to support translations stored using > different encodings. Tell that to the Japanese. > Couldn't we use some library that already have this, like glib (or > som

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Peter Eisentraut
Tatsuo Ishii writes: > > 3. Implement Unicode collation algorithm and character classification > > routines that are aware of 1. Use that in place of system locale > > routines. > > I don't see a relationship between Unicode and the one you are going > to replace the system locale routines. If yo

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Dennis Bjorklund
On Mon, 24 Nov 2003, Peter Eisentraut wrote: > 1. Take out the character set conversion routines from the backend and > make them a library of their own. This could possibly be modelled after > iconv, but not necessarily. Or we might conclude that we can just use > iconv in the first place. > >

Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Tatsuo Ishii
> OK, I've been spreading rumours about fixing the internationalization > problems, so let me make it a bit more clear. Here are the problems that > need to be fixed: > > - Only one locale per process possible. > > - Only one gettext-language per process possible. > > - lc_collate and lc_ctype