filename encodings and conversion failure

Karl Berry Thu, 22 Dec 2022 14:40:37 -0800

A file with a name that has some "eight-bit" UTF-8 bytes (fn...-utf8.tex)
was committed to one of my repositories. When I try to check it out in
the C locale, svn complains:


$ echo $LC_ALL
C
$ svn update
svn: E000022: Can't convert string from 'UTF-8' to native encoding:
svn: E000022: fn{U+00B1}{U+00D7}{U+00F7}{U+00A7}{U+00B6}-utf8.tex

Or, in ls terms:
$ ls --quoting-style escape fn??*-utf8.tex
fn\302\261\303\227\303\267\302\247\302\266-utf8.tex

Clearly those UTF-8 code points cannot be "converted" by svn to the
7-bit ASCII locale that is "C". Fine; I don't expect it to.  Is there a
way to force svn to complete the checkout anyway? That is, just check
out the file and let the name be whatever the bytes are. I don't
understand why any "conversion" by svn is necessary merely to operate on
files.

Sure, the name may show up as garbage when I do things in my terminal,
but that's my problem, not svn's. I didn't ask (and don't want) svn to
convert anything.

Incidentally, this is not about UTF-8 specifically. The same commit
included names in SJIS and EUC encodings (they are test files for a new
feature in Japanese TeX). The question is, in general, why svn needs to
"convert" filenames at all.

I did some searching both in the mailing list archives and on the web,
to no avail. People had related problems, but I didn't see this (more
basic) question being asked.

This is with a somewhat old svn that I compiled myself:
svn, version 1.13.0 (r1867053)
   compiled Nov 10 2019, 18:06:58 on x86_64-unknown-linux-gnu

I'm guessing svn behavior in this regard has not changed since 1.13.0,
but if I'm wrong about that, sorry for the noise, and I'll happily
recompile the latest.

Thanks for any info,
Karl

filename encodings and conversion failure

Reply via email to