On 12/30/2022 4:40 PM, Karl Berry wrote:
     Well my point is that this would not work everywhere.

How can "store as bytes" not work (be implementable?) everywhere?  I'm
missing something.

I seem to remember an earlier message in the thread mentioning Windows... where filenames are natively UTF-16LE, and the various file/path API functions have a "wide character" version that takes a UTF-16LE-encoded filenames, and an "ANSI" version that will convert single-byte/multi-byte charset filenames to/from UTF-16LE (where the source encoding is generally determined by a system-wide setting; e.g., in the US, most systems would be using Windows-1252, which is similar to ISO 8859-1, although not identical. And Japanese systems would probably be set to use Windows-932, which is basically Shift-JIS).

So if SVN on Windows used the UTF-16 APIs and "stored as bytes", it'd be incompatible with *nix: a *nix README file, 0x52 0x45 0x41 0x44 0x4d 0x45 would turn into U+4552 U+4441 U+454D, or 䕒䑁䕍 in Windows. And what happens with filenames that are an odd number of bytes long? Or in the other direction, a README file committed from Windows couldn't be checked out on *nix because 0x52 0x00 0x45 0x00 etc would appear to contain NUL characters from the *nix POV.

And if SVN on Windows used the single-byte charset APIs, the README example would work, but any filenames with non-ASCII characters would either change depending on the system-wide locale setting, or perhaps not be able to be checked out at all.

So as a Windows user, I think it's good that SVN converts filenames by default. That said, perhaps it would be useful to have an svn: property or something that says not to do any conversion on this filename.

--
Name: Dave Huang         |  Mammal, mammal / their names are called /
INet: k...@azeotrope.org |  they raise a paw / the bat, the cat /
                         |  dolphin and dog / koala bear and hog -- TMBG
Dahan: Hani G Y+C 47 Y++ L+++ W- C++ T++ A+ E+ S++ V++ F- Q+++ P+ B+ PA+ PL++

Reply via email to