Re: filename encodings and conversion failure

David Huang Fri, 30 Dec 2022 15:25:39 -0800

On 12/30/2022 4:40 PM, Karl Berry wrote:

     Well my point is that this would not work everywhere.


How can "store as bytes" not work (be implementable?) everywhere?  I'm
missing something.

I seem to remember an earlier message in the thread mentioningWindows... where filenames are natively UTF-16LE, and the variousfile/path API functions have a "wide character" version that takes aUTF-16LE-encoded filenames, and an "ANSI" version that will convertsingle-byte/multi-byte charset filenames to/from UTF-16LE (where thesource encoding is generally determined by a system-wide setting; e.g.,in the US, most systems would be using Windows-1252, which is similar toISO 8859-1, although not identical. And Japanese systems would probablybe set to use Windows-932, which is basically Shift-JIS).

So if SVN on Windows used the UTF-16 APIs and "stored as bytes", it'd beincompatible with *nix: a *nix README file, 0x52 0x45 0x41 0x44 0x4d0x45 would turn into U+4552 U+4441 U+454D, or 䕒䑁䕍 in Windows. And whathappens with filenames that are an odd number of bytes long? Or in theother direction, a README file committed from Windows couldn't bechecked out on *nix because 0x52 0x00 0x45 0x00 etc would appear tocontain NUL characters from the *nix POV.

And if SVN on Windows used the single-byte charset APIs, the READMEexample would work, but any filenames with non-ASCII characters wouldeither change depending on the system-wide locale setting, or perhapsnot be able to be checked out at all.

So as a Windows user, I think it's good that SVN converts filenames bydefault. That said, perhaps it would be useful to have an svn: propertyor something that says not to do any conversion on this filename.


--
Name: Dave Huang         |  Mammal, mammal / their names are called /
INet: k...@azeotrope.org |  they raise a paw / the bat, the cat /
                         |  dolphin and dog / koala bear and hog -- TMBG
Dahan: Hani G Y+C 47 Y++ L+++ W- C++ T++ A+ E+ S++ V++ F- Q+++ P+ B+ PA+ PL++

Re: filename encodings and conversion failure

Reply via email to