Title: RE: Roundtripping in Unicode

Arcane Jill wrote:
> The obvious solution is for all Unix machines everywhere to
> be using the
> same locale - and it had better be UTF-8. But an instantaneous global
> switch-over is never going to happen, so we see this gradual
> switch-over ...
> and it is during this transition phase that Lars's problem manifests.
Yes, some may not experience it, some will experience it for a day, some for a month, some for a year, some indefinitely.

And unless filesystems prevent invalid sequences to be added, it will keep happening to everybody. And if very seldom, then it will be even harder to find a person who can fix it.

> Of course, you are suggesting not /really/ suggesting that
> the Unix kernel
> be rewritten. But it's hard to for me to see how else this could be
> achieved.

What one might pursue is to make the UNIX filesystem invariant, so Windows-like. In that scenario, a filesystem stores Unicode strings and adjusts the representation of filenames according to user's locale. But there are two reasons against it:

A - If only the filesystem does it, then whenever you switch the locale, all references to files in other files break. Unless you treat the files in the same manner, which is what Windows does if an application is not Unicode (with a number of associated problems on top). But that is not what is supposed to be done on UNIX.

B - As we move to UTF-8, there will be less and less need to use different locales. So why bother with enabling the system to represent UTF-8 in any other locale if that locale will not even be used anymore. Concerns with the transition period do apply, but then you end up with two transitions, which is even less appealing.


So, the only percievable option is to start thinking about validation in the filesystem. If and when one choses to enable it. But keep in mind that it will only reduce the problem. Not all programs will be able to rely on it (like virus scanners, HSM, backup, ...).


Lars

Reply via email to