Tuomo Valkonen wrote:
> On 2007-10-18, Russell Shaw <[EMAIL PROTECTED]> wrote:
>> What alternative is there to UTF-8? An advantage of monoculturalism is that
>> if the architecture is sufficient, everything can be consistant and easy.
> 
> There are problems with locale encoding and wchar_t, but fundamentally
> their abstraction is better than specifying a Single Global Encoding.
> Specifying "everything is UTF-8" is an evolutionary dead-end. I think
> it's better to say "here's wchar_t and functions to operate on it. We 
> don't actually specify what the actual encoding is, because then it's 
> a blackbox that can easily be changed." Almost likewise with LC_CTYPE 
> multibyte encodings. Unfortunately they forget to provide convenient 
> functions for encoding conversions when communicating with the external 
> world (that should mostly be in the libraries, seldom in applications), 
> and the libc multibyte routines are a bit too limited, etc. That's 
> however something that could easily be solved if people weren't so 
> intent on creating another problem almost as big as the ASCII and 
> Latin1 assumptions that we're still suffering from. Indeed, you do
> need and want that kind of libraries to conveniently use that Single
> Global Standard too; the difference is that by specifying a particular
> encoding, clean design is not encouraged, and applications can and 
> will expect that encoding and not do things abstractly through a 
> handful of libraries that could easily be changed (or configured).
> 
> Another major problem is the unix and C "untyped" text file and stream 
> legacy, so you have to assume every file is in some encoding -- ASCII, 
> LC_CTYPE, UTF-8, or so, which it may not be. That could also be solved 
> by e.g. creating a "typed" plain text file (could be mime type stored
> on fs) and stream format, assuming the locale encoding for legacy
> stuff, and opening text files though some library as text streams, 
> that then does the conversions to the abstract application internal
> encoding (either multibyte encoding -- not necessary LC_CTYPE, to 
> allow wider character ranges internally in programs than in legacy 
> files --  or wide character). That's a rather big task, but not really
> that much bigger than a transition to a global monoculture.

I find it hard to see those problems because i rarely handle non-english
text.

In the general-purpose editing applications i've made (like a word processor),
any non-english text is passed out to a "black box" unicode layout processor
plugin for things like paragraph formatting, and i can make it UTF-8 or UTF-32
or whatever data encoding is convenient. I see "all UTF-8" as only applying
between completely separate applications on the pc.

I've done hardly any non-english processing, but iirc, UTF-8 files are supposed
to start with a magic number. If all text files were UTF-8, the magic number
wouldn't be needed. I'm probably missing something you mean.

I find it hard to see how all kinds of config files in /etc called be made
non 7-bit ascii without major parsing pain. To me, config file tokens should be 
in 7-bit latin because the content is more like program code that only 
programmers should see, and any non-english configuration should be done through
an i18n-ized gui imo (not having thought of anything better).
_______________________________________________
wm-spec-list mailing list
wm-spec-list@gnome.org
http://mail.gnome.org/mailman/listinfo/wm-spec-list

Reply via email to