Hi Mark,
when i read Neville's post i thought also about normalize, although i really do
not have a clue about the whole unicode stuff, but i remembered that the
standalone builder make use of the normalize function. ;)
So i used this script on LC Server to write the seconds to a file containing an
a-umlaut in its name.
put normalizeText("testä.txt", "NFC") into tFile
put the seconds into URL ("binfile:"&tFile)
put the result
put "<br><br>"
put the files
put "<br><br>"
put tFile
But that does not work. "The result" returns 'can't open file'.
As i already wrote i have no clue about unicode so i tried also NFD and also
the other 2 options, but also w/o success.
Is there something else that one hast to keep in mind to have success with
this?
Regards,
Matthias
> Am 14.08.2023 um 12:22 schrieb Mark Waddingham via use-livecode
> <[email protected]>:
>
> On 2023-08-14 02:45, Neville Smythe via use-livecode wrote:
>> OK, so the macOS *is* using utf-8 for its file names - the [e-acute] in the
>> filename Carré.txt is rendered with two bytes [C3A9] not the single byte
>> MacRoman encoding. I got tricked by copying the terminal listing into
>> another program rather than hex dumping within the terminal, and somewhere
>> in the process the native encoding was preferred.
>> So one must *not* textEncode a filename to utf-8 before writing a file to
>> disk, LC deals with the encoding, although you *should” textEncode its
>> contents.
>> Which leaves the problem of why I can’t get LC Server on Linux to write
>> non-ascii filenames
>
> So I suspect the problem here is normalization, rather than the inability of
> Linux to write non-ascii filenames.
>
> Characters such as e-acute / e-grave have *two* representations in unicode -
> the decomposed and composed form.
>
> The composed form is a direct mapping from the native encodings and is a
> single codepoint, the decomposed form will be two codepoints - (e,
> combining-acute/grave)
>
> Depending on where the string comes from it might either be composed or
> decomposed - macOS filenames are stored decomposed in the FS, but the
> higher-level parts of the OS make either form work (in a similar fashion to
> how macOS filesystems are case-insensitive by default).
>
> Linux filesystems, however, are both case-sensitive and form-sensitive - a
> filename must match byte to byte with what it was created with (indeed, linux
> filesystems care nothing for encodings, they see filenames as a sequence of
> bytes which are interpreted relative to the user's current locale - the
> default locale on linux these days is utf-8).
>
> If your app is managing the files completely on Linux (i.e. it is creating /
> deleting them and the filenames are not user-editable) then (if this is the
> caseu) the problem should be fixable by choosing a normalization form when
> you create / lookup the file - i.e. pass all filenames on the server through
> `normalizeText(<str>, <form>)` - here you want form to be either "NFC"
> (composed) or "NFD" (decomposed).
>
> Warmest Regards,
>
> Mark.
>
> P.S. For all the gory details about Unicode normalization forms see -
> https://unicode.org/reports/tr15/
>
> --
> Mark Waddingham ~ [email protected] ~ http://www.livecode.com/
> LiveCode: Build Amazing Things
>
> _______________________________________________
> use-livecode mailing list
> [email protected]
> Please visit this url to subscribe, unsubscribe and manage your subscription
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
_______________________________________________
use-livecode mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode