On 2013-07-09 20:21:33 +0200, Branko Čibej wrote: > Unlike on Windows and Mac OS (the latter at least with HFS+), the is no > notion of native filesystem encoding on other Unix-like platforms. The > best we can do is look at the locale settings, specifically, LC_CTYPE.
No, the best you can do is to let the user choose. LC_CTYPE typically specifies the encoding used by the *terminal*, and this encoding may change when the user connects by SSH from a terminal with a different encoding. > I posit that if the "native encoding" is supposed to be UTF-8, then it > is an error to use LANG=C at all. Instead, one should use LANG=C.UTF-8. LANG=C.UTF-8 is completely non-portable for scripts. For instance: xvii:~> LANG=C.UTF-8 cp cp: opérande de fichier manquant Saisissez « cp --help » pour plus d'informations. xvii:~> LANG=C cp cp: missing file operand Try 'cp --help' for more information. A script that needs to work in some well-defined way, in particular with English messages (if they need to be parsed), must use the C (or POSIX) locale. With most tools, this is fine as they don't need to know how filenames are encoded. > In a context where, for example, most files were encoded in Big5 > (http://en.wikipedia.org/wiki/Big5) — not a too far-fetched proposition > — it would be slightly insane, to put it mildly, for Subversion to > assume it can just write UTF-8 to disk. Users who want UTF-8 on disk could choose UTF-8 in a config file. Users who want Big5 on disk could choose Big5 in a config file. There should also be a way to have ASCII encoding (like what is done for URL's), for users who want things to work in every context with the possibly-minor drawback of having some filenames that are hardly readable with basic tools. > So indeed, this state of affairs puts the burden of setting up their > locale correctly on users, but that's simply the way Unix works. No, according to POSIX, a filename just consists of a sequence of bytes. How to interpret it is what *you* choose. -- Vincent Lefèvre <vinc...@vinc17.net> - Web: <http://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)