There are no such libraries. I keep hearing ICU, but that is much too
bloated.
At least it is kind of standard and also something what will be
maintained for foreseeable future, it also has a compatible license and
is available on all platforms of interest to postgresql.
And it is used
On Tue, Nov 25, 2003 at 04:19:05PM -0500, Tom Lane wrote:
UCS-2 is impractical without some *extremely* wide-ranging changes in
the backend. To take just the most obvious point, doesn't it require
allowing embedded zero bytes in text strings?
If you're going to use unicode in the rest of
OK, I've been spreading rumours about fixing the internationalization
problems, so let me make it a bit more clear. Here are the problems that
need to be fixed:
- Only one locale per process possible.
- Only one gettext-language per process possible.
- lc_collate and lc_ctype need to
On Mon, 24 Nov 2003, Peter Eisentraut wrote:
1. Take out the character set conversion routines from the backend and
make them a library of their own. This could possibly be modelled after
iconv, but not necessarily. Or we might conclude that we can just use
iconv in the first place.
2.
Tatsuo Ishii writes:
3. Implement Unicode collation algorithm and character classification
routines that are aware of 1. Use that in place of system locale
routines.
I don't see a relationship between Unicode and the one you are going
to replace the system locale routines. If you are
Dennis Bjorklund writes:
Force all translations to be in unicode and convert to other client
encodings if needed. There is no need to support translations stored using
different encodings.
Tell that to the Japanese.
Couldn't we use some library that already have this, like glib (or
Have you looked at what is available from
http://oss.software.ibm.com/icu/ ?
Seems they have a compatible license, but use some C++.
Andreas
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
On Tue, 25 Nov 2003, Peter Eisentraut wrote:
Force all translations to be in unicode and convert to other client
encodings if needed. There is no need to support translations stored using
different encodings.
Tell that to the Japanese.
I've always thought unicode was enough to even
On Tue, 25 Nov 2003, Peter Eisentraut wrote:
Force all translations to be in unicode and convert to other client
encodings if needed. There is no need to support translations stored using
different encodings.
Tell that to the Japanese.
I've always thought unicode was enough to
On Tue, 25 Nov 2003, Tatsuo Ishii wrote:
I'm tired of telling that Unicode is not that perfect. Another gottcha
with Unicode is the UTF-8 encoding (currently we use) consumes 3
bytes for each Kanji character, while other encodings consume only 2
bytes. IMO 3/2 storage ratio could not be
On Tue, 25 Nov 2003, Tatsuo Ishii wrote:
I'm tired of telling that Unicode is not that perfect. Another gottcha
with Unicode is the UTF-8 encoding (currently we use) consumes 3
bytes for each Kanji character, while other encodings consume only 2
bytes. IMO 3/2 storage ratio could not be
Peter Eisentraut [EMAIL PROTECTED] writes:
Dennis Bjorklund writes:
Couldn't we use some library that already have this, like glib (or
something else). If it's not up to what we need, than fix that library
instead.
I wasn't aware that glib had this. I'll look.
Of course the trouble with
Peter Eisentraut [EMAIL PROTECTED] writes:
Actually, what will more likely happen is that we'll define a collation as
a collection of one or more support functions, the equivalents of
strxfrm() and possibly a few more. Then it will be up to those functions
to define the collation order. The
Tom Lane [EMAIL PROTECTED] writes:
Peter Eisentraut [EMAIL PROTECTED] writes:
I wasn't aware that glib had this. I'll look.
Of course the trouble with relying on glibc is that we'd have no solution
for platforms that don't use glibc.
glib != glibc. glib is the low-level library used
Dennis Bjorklund kirjutas T, 25.11.2003 kell 14:51:
On Tue, 25 Nov 2003, Tatsuo Ishii wrote:
I'm tired of telling that Unicode is not that perfect.
Of course not, but neither is the current multibyte with only marginal
support for unicode (many people actually need upper()/lower() )
Peter Eisentraut [EMAIL PROTECTED] writes:
2. Reimplement gettext to use 1. and allow switching of language and
encoding at run-time.
3. Implement Unicode collation algorithm and character classification
routines that are aware of 1. Use that in place of system locale
routines.
This
On Tue, Nov 25, 2003 at 08:40:57PM +0900, Tatsuo Ishii wrote:
On Tue, 25 Nov 2003, Peter Eisentraut wrote:
I've always thought unicode was enough to even represent Japanese. Then
the client encoding can be something else that we can convert to. In any
way, the encoding of the message
Greg Stark writes:
This sounds like you want to completely reimplement all of the locale handling
provided by the OS? That seems like a dead-end approach to me. There's no way
your handling will ever be as complete or as well optimized as some OS's.
Actually, I'm pretty sure it will be more
Peter Eisentraut kirjutas T, 25.11.2003 kell 21:13:
Greg Stark writes:
This sounds like you want to completely reimplement all of the locale handling
provided by the OS? That seems like a dead-end approach to me. There's no way
your handling will ever be as complete or as well optimized
Kurt Roeckx [EMAIL PROTECTED] writes:
You can encode unicode in different ways, and UTF-8 is only one
of them. Is there a problem with using UCS-2 (except that it
would require more storage for ASCII)?
UCS-2 is impractical without some *extremely* wide-ranging changes in
the backend. To take
About storing data in the database, I would expect it to work with any
encoding, just like I would expect pg to be able to store images in any
format.
What's stopping us supporting the other Unicode encodings, eg. UCS-16
which could save Japansese storage space.
Chris
Greg Stark [EMAIL PROTECTED] writes:
The only advantage to adding locales per-column and/or per-index is the
notational simplicity.
Well, actually, the reason we are interested in doing it is the SQL spec
demands it.
regards, tom lane
---(end of
OK, I've been spreading rumours about fixing the internationalization
problems, so let me make it a bit more clear. Here are the problems that
need to be fixed:
- Only one locale per process possible.
- Only one gettext-language per process possible.
- lc_collate and lc_ctype need to be held
23 matches
Mail list logo