On Wed, 10 Jul 2002, Barry Caplan wrote:
> At 08:43 AM 7/10/2002 -0400, Jungshik Shin wrote: > >> In short: should I still stick to ASCII alone in filenames, or are there > >> filesystems where I really don't have to anymore? Thanks in advance. > > > > Definitely/unconditionally no for NTFS. As for Linux ext2(and most other > >Unix fs'), unless you mix up UTF-8 and legacy encodings (which you > >wouldn't because you have never used non-ASCII), it's all right to switch > >to UTF-8 and use non-ASCII chars. > > But be aware that such filenames may or may not be able to be > transferred *across* file systems. You're absolutely right. Another related problem is normalization. For instance, MacOS X uses one NF while NTFS uses another. And, I haven't dug up what's planned about this on Unix fs and NFS front . Some Unix fs-related APIs may have to be extended to deal with NF's. > Not only that, but, although I haven't tested in detail for a while, > I would not be fully comfortable with middleware that is responsible for > managing file names across systems either, such as FTP, email attachments, > and Samba. Particularly in the case of FTP and email, just because one > client works does not mean another one will. Samba 3.0 appears to support Unicode (see http://sambaaxp.org/xamba_XP_2002/vergeichick.pdf). BTW, from my own experience, I know that codepage-based (non-unicode encoding) support in samba 2.x works well between Win2k and Unix. As for email attachment, one should stick to IETF RFC 2231. Of course, not all email clients are compliant to RFC 2231(Mozilla and Pine are among the compliant), but I think that's the best way to get your filenames across. Even fewer web clients and servers abide by RFC 2231(actually, I haven't seen any. None of Mozilla 1.x, Lynx 2.8, and MS IE 6 supports this.) when it comes to http Content-Disposition header (the same header used for email attachment). Hopefully, this will change. (e.g. http://bugzilla.mozilla.org/show_bug.cgi?id=155949) Some IETF drafts and RFCs have been written about I18N of FTP and are available at http://www.ietf.org/html.charters/ftpext-charter.html. By any means, this is not to say that one can right now use Unicode(UTF-8) for FTP except when one uses Kermit. > Also keep in mind that even if the file name transfers exactly correct, > there is no guarantee, except, for ASCII characters, that the system > will have fonts to display the file name. Well, not being able to display is a problem of a different dimension than not being able to get filenames across intact. Moreover, two parties exchanging filenames, say, in Chinese/Finnish/Thai/... are likely to have necessary fonts. Jungshik Shin