Hi Mark, when i read Neville's post i thought also about normalize, although i really do not have a clue about the whole unicode stuff, but i remembered that the standalone builder make use of the normalize function. ;)
So i used this script on LC Server to write the seconds to a file containing an a-umlaut in its name. put normalizeText("testä.txt", "NFC") into tFile put the seconds into URL ("binfile:"&tFile) put the result put "<br><br>" put the files put "<br><br>" put tFile But that does not work. "The result" returns 'can't open file'. As i already wrote i have no clue about unicode so i tried also NFD and also the other 2 options, but also w/o success. Is there something else that one hast to keep in mind to have success with this? Regards, Matthias > Am 14.08.2023 um 12:22 schrieb Mark Waddingham via use-livecode > <use-livecode@lists.runrev.com>: > > On 2023-08-14 02:45, Neville Smythe via use-livecode wrote: >> OK, so the macOS *is* using utf-8 for its file names - the [e-acute] in the >> filename Carré.txt is rendered with two bytes [C3A9] not the single byte >> MacRoman encoding. I got tricked by copying the terminal listing into >> another program rather than hex dumping within the terminal, and somewhere >> in the process the native encoding was preferred. >> So one must *not* textEncode a filename to utf-8 before writing a file to >> disk, LC deals with the encoding, although you *should” textEncode its >> contents. >> Which leaves the problem of why I can’t get LC Server on Linux to write >> non-ascii filenames > > So I suspect the problem here is normalization, rather than the inability of > Linux to write non-ascii filenames. > > Characters such as e-acute / e-grave have *two* representations in unicode - > the decomposed and composed form. > > The composed form is a direct mapping from the native encodings and is a > single codepoint, the decomposed form will be two codepoints - (e, > combining-acute/grave) > > Depending on where the string comes from it might either be composed or > decomposed - macOS filenames are stored decomposed in the FS, but the > higher-level parts of the OS make either form work (in a similar fashion to > how macOS filesystems are case-insensitive by default). > > Linux filesystems, however, are both case-sensitive and form-sensitive - a > filename must match byte to byte with what it was created with (indeed, linux > filesystems care nothing for encodings, they see filenames as a sequence of > bytes which are interpreted relative to the user's current locale - the > default locale on linux these days is utf-8). > > If your app is managing the files completely on Linux (i.e. it is creating / > deleting them and the filenames are not user-editable) then (if this is the > caseu) the problem should be fixable by choosing a normalization form when > you create / lookup the file - i.e. pass all filenames on the server through > `normalizeText(<str>, <form>)` - here you want form to be either "NFC" > (composed) or "NFD" (decomposed). > > Warmest Regards, > > Mark. > > P.S. For all the gory details about Unicode normalization forms see - > https://unicode.org/reports/tr15/ > > -- > Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/ > LiveCode: Build Amazing Things > > _______________________________________________ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode