On 19.07.2013 15:22, Vincent Lefevre wrote: > On 2013-07-09 20:21:33 +0200, Branko Čibej wrote: >> Unlike on Windows and Mac OS (the latter at least with HFS+), the is no >> notion of native filesystem encoding on other Unix-like platforms. The >> best we can do is look at the locale settings, specifically, LC_CTYPE. > No, the best you can do is to let the user choose. LC_CTYPE typically > specifies the encoding used by the *terminal*, and this encoding may > change when the user connects by SSH from a terminal with a different > encoding. > >> I posit that if the "native encoding" is supposed to be UTF-8, then it >> is an error to use LANG=C at all. Instead, one should use LANG=C.UTF-8. > LANG=C.UTF-8 is completely non-portable for scripts. For instance: > > xvii:~> LANG=C.UTF-8 cp > cp: opérande de fichier manquant > Saisissez « cp --help » pour plus d'informations. > > xvii:~> LANG=C cp > cp: missing file operand > Try 'cp --help' for more information. > > A script that needs to work in some well-defined way, in particular > with English messages (if they need to be parsed), must use the C > (or POSIX) locale. With most tools, this is fine as they don't need > to know how filenames are encoded.
Frankly I'm not interested in portable scripts. All you're showing above is that on your particular system, setting LANG=C.UTF-8 doesn't do anything. So perhaps you'll have to use LC_CTYPE=UTF-8, LANG=en_US.UTF-8, or whatever happens to work on your particular flavour of Unix-like OS. All this is beside the point. The point is that it it not up to Subversion to invent a new way of dealing with file-name encodings. We use setlocale(LC_ALL, ""), this is the API that POSIX gives us and there is no other that I'm aware of. And we're certainly not going to break every working copy in existence by changing the way we transcode file names on Unix (except Mac OS, which is always UTF-8 anyway). I'll also point out that if you /need/ consistent, parseable output in scripts, the command-line client already provides an --xml flag. Sure, it would be nice if POSIX defined a portable way to consistently determine file-name encoding, or even if there were reliable, non-portable, OS-specific ways that we could use. But I'm not aware of any. -- Brane -- Branko Čibej | Director of Subversion WANdisco // Non-Stop Data e. br...@wandisco.com