On 12/30/2022 4:40 PM, Karl Berry wrote:
Well my point is that this would not work everywhere.
How can "store as bytes" not work (be implementable?) everywhere? I'm
missing something.
I seem to remember an earlier message in the thread mentioning
Windows... where filenames are natively UTF-16LE, and the various
file/path API functions have a "wide character" version that takes a
UTF-16LE-encoded filenames, and an "ANSI" version that will convert
single-byte/multi-byte charset filenames to/from UTF-16LE (where the
source encoding is generally determined by a system-wide setting; e.g.,
in the US, most systems would be using Windows-1252, which is similar to
ISO 8859-1, although not identical. And Japanese systems would probably
be set to use Windows-932, which is basically Shift-JIS).
So if SVN on Windows used the UTF-16 APIs and "stored as bytes", it'd be
incompatible with *nix: a *nix README file, 0x52 0x45 0x41 0x44 0x4d
0x45 would turn into U+4552 U+4441 U+454D, or 䕒䑁䕍 in Windows. And what
happens with filenames that are an odd number of bytes long? Or in the
other direction, a README file committed from Windows couldn't be
checked out on *nix because 0x52 0x00 0x45 0x00 etc would appear to
contain NUL characters from the *nix POV.
And if SVN on Windows used the single-byte charset APIs, the README
example would work, but any filenames with non-ASCII characters would
either change depending on the system-wide locale setting, or perhaps
not be able to be checked out at all.
So as a Windows user, I think it's good that SVN converts filenames by
default. That said, perhaps it would be useful to have an svn: property
or something that says not to do any conversion on this filename.
--
Name: Dave Huang | Mammal, mammal / their names are called /
INet: k...@azeotrope.org | they raise a paw / the bat, the cat /
| dolphin and dog / koala bear and hog -- TMBG
Dahan: Hani G Y+C 47 Y++ L+++ W- C++ T++ A+ E+ S++ V++ F- Q+++ P+ B+ PA+ PL++