RE: Roundtripping in Unicode

Lars Kristan Mon, 13 Dec 2004 17:12:56 -0800

Title: RE: Roundtripping in Unicode

Peter Kirk wrote:

> Now no doubt many Unix filename handling utilities ignore the
> fact that
> some octets are invalid or uninterpretable in the locale,
> because they
> handle filenames as octet strings (with 0x00 and 0x2F having special
> interpretations) rather than as locale-dependent character
> strings. But
> these routines should continue to work in a UTF-8 locale, as
> they make
> no attempt to interpret any octets other than 0x00 and 0x2F.

Hmmmmm, here lies the catch. According to UTC, you need to keep processing the UNIX filenames as BINARY data. And, also according to UTC, any UTF-8 function is allowed to reject invalid sequences. Basically, you are not supposed to use strcpy to process filenames.

Well, I just hope noone will listen to them and modify strcpy and strchr to validate the data when running in UTF-8 locale and start signalling something (really, where and how?!). The two statements from UTC don't make sense when put together. Unless we are really expected to start building everything from scratch.

> All of this is ingenious, and may be useful for internal processing
> within a Unix system, and perhaps even for interaction between
> cooperating systems. But NOT-Unicode is not Unicode (!) and
> so Unicode
> should not be expected to standardise it.
Not by definition. But if it would help the users since it would simplify the transition, then why not?

Lars

RE: Roundtripping in Unicode

Reply via email to