Hi Mark,

when i read Neville's post i thought also about normalize, although i really do 
not have a clue about the whole unicode stuff, but i remembered that the 
standalone builder make use of the normalize function. ;)

So i used this script on LC Server to write the seconds to a file containing an 
a-umlaut in its name.

put  normalizeText("testä.txt", "NFC") into tFile
put the seconds into URL ("binfile:"&tFile)
put the result
put "<br><br>"
put the files
put "<br><br>"
put tFile

But that does not work. "The result" returns 'can't open file'. 
As i already wrote i have no clue about unicode so i tried also NFD and also 
the other 2 options, but also w/o success.

Is there something else that  one hast to keep in mind to have success with 
this?


Regards,
Matthias



> Am 14.08.2023 um 12:22 schrieb Mark Waddingham via use-livecode 
> <use-livecode@lists.runrev.com>:
> 
> On 2023-08-14 02:45, Neville Smythe via use-livecode wrote:
>> OK, so the macOS *is* using utf-8 for its file names - the [e-acute] in the 
>> filename Carré.txt is rendered with two bytes [C3A9] not the single byte 
>> MacRoman encoding. I got tricked by copying the terminal listing into 
>> another program rather than hex dumping within the terminal, and somewhere 
>> in the process the native encoding was preferred.
>> So one must *not* textEncode a filename to utf-8 before writing a file to 
>> disk, LC deals with the encoding, although you *should” textEncode its 
>> contents.
>> Which leaves the problem of why I can’t get LC Server on Linux to write 
>> non-ascii filenames
> 
> So I suspect the problem here is normalization, rather than the inability of 
> Linux to write non-ascii filenames.
> 
> Characters such as e-acute / e-grave have *two* representations in unicode - 
> the decomposed and composed form.
> 
> The composed form is a direct mapping from the native encodings and is a 
> single codepoint, the decomposed form will be two codepoints - (e, 
> combining-acute/grave)
> 
> Depending on where the string comes from it might either be composed or 
> decomposed - macOS filenames are stored decomposed in the FS, but the 
> higher-level parts of the OS make either form work (in a similar fashion to 
> how macOS filesystems are case-insensitive by default).
> 
> Linux filesystems, however, are both case-sensitive and form-sensitive - a 
> filename must match byte to byte with what it was created with (indeed, linux 
> filesystems care nothing for encodings, they see filenames as a sequence of 
> bytes which are interpreted relative to the user's current locale - the 
> default locale on linux these days is utf-8).
> 
> If your app is managing the files completely on Linux (i.e. it is creating / 
> deleting them and the filenames are not user-editable) then (if this is the 
> caseu) the problem should be fixable by choosing a normalization form when 
> you create / lookup the file - i.e. pass all filenames on the server through 
> `normalizeText(<str>, <form>)` - here you want form to be either "NFC" 
> (composed) or "NFD" (decomposed).
> 
> Warmest Regards,
> 
> Mark.
> 
> P.S. For all the gory details about Unicode normalization forms see - 
> https://unicode.org/reports/tr15/
> 
> -- 
> Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/
> LiveCode: Build Amazing Things
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to