On 2023-08-14 02:45, Neville Smythe via use-livecode wrote:
OK, so the macOS *is* using utf-8 for its file names - the [e-acute] in the filename Carré.txt is rendered with two bytes [C3A9] not the single byte MacRoman encoding. I got tricked by copying the terminal listing into another program rather than hex dumping within the terminal, and somewhere in the process the native encoding was preferred.

So one must *not* textEncode a filename to utf-8 before writing a file to disk, LC deals with the encoding, although you *should” textEncode its contents.

Which leaves the problem of why I can’t get LC Server on Linux to write non-ascii filenames

So I suspect the problem here is normalization, rather than the inability of Linux to write non-ascii filenames.

Characters such as e-acute / e-grave have *two* representations in unicode - the decomposed and composed form.

The composed form is a direct mapping from the native encodings and is a single codepoint, the decomposed form will be two codepoints - (e, combining-acute/grave)

Depending on where the string comes from it might either be composed or decomposed - macOS filenames are stored decomposed in the FS, but the higher-level parts of the OS make either form work (in a similar fashion to how macOS filesystems are case-insensitive by default).

Linux filesystems, however, are both case-sensitive and form-sensitive - a filename must match byte to byte with what it was created with (indeed, linux filesystems care nothing for encodings, they see filenames as a sequence of bytes which are interpreted relative to the user's current locale - the default locale on linux these days is utf-8).

If your app is managing the files completely on Linux (i.e. it is creating / deleting them and the filenames are not user-editable) then (if this is the caseu) the problem should be fixable by choosing a normalization form when you create / lookup the file - i.e. pass all filenames on the server through `normalizeText(<str>, <form>)` - here you want form to be either "NFC" (composed) or "NFD" (decomposed).

Warmest Regards,

Mark.

P.S. For all the gory details about Unicode normalization forms see - https://unicode.org/reports/tr15/

--
Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/
LiveCode: Build Amazing Things

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to