Tim Haley wrote:
> Roland Mainz wrote:
> > Bart Smaalders wrote:
> >> Marcus Sundman wrote:
> >>> I'm unable to find more info about this. E.g., what does "reject file
> >>> names" mean in practice? E.g., if a program tries to create a file
> >>> using an utf8-incompatible filename, what happens? Does the fopen()
> >>> fail? Would this normally be a problem? E.g., do tar and similar
> >>> programs convert utf8-incompatible filenames to utf8 upon extraction if
> >>> my locale (or wherever the fs encoding is taken from) is set to use
> >>> utf-8? If they don't, then what happens with archives containing
> >>> utf8-incompatible filenames?
> >> Note that the normal ZFS behavior is exactly what you'd expect: you
> >> get the filenames you wanted; the same ones back you put in.
> >
> > Does ZFS convert the strings to UTF-8 in this case or will it just store
> > the multibyte sequence unmodified ?
> >
> ZFS doesn't muck with names it is sent when storing them on-disk.  The
> on-disk name is exactly the sequence of bytes provided to the open(),
> creat(), etc.  If normalization options are chosen, it may do some
> manipulation of the byte strings *when comparing* names, but the on-disk
> name should be untouched from what the user requested.

Ok... that was the part which I was _praying_ for... :-)

... just some background (for those who may be puzzled by the statement
above): The conversion to Unicode is not always "lossless" (Unicode is
sometimes marketed as
"convert-any-encoding-to-unicode-without-loosing-any-information") ...
for example if you have a mixed-language ISO-2022 character sequence the
conversion to Unicode will use the language information itself and
converting it back to an ISO-2022 sequence will result in a different
multibyte sequence than the original input (the issue could be
worked-around by inserting the "language tag" characters to preserve
this information but almost every converter doesn't do that (and since
these "tags" are outside the BMP you have to pray that everything in the
toolchain works with Unicode charcters beyond 65535) ... ;-( ).

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) [EMAIL PROTECTED]
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 7950090
 (;O/ \/ \O;)
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to