On Thu, Jan 12, 2023 at 09:20:34AM +0100, tlaro...@polynum.com wrote: > I don't know if this is for tech-kern or tech-userlevel (perhaps the > two). > > I just read today, on the devel UEFI edk2 devel list, from patches for > ext4, a comment on the problem of the encoding of dir entries. > > The problem is that, generally in fs, no encoding is specified: dir > entries are just a sequence of bytes, whether nul byte terminated or > with the length of the entry given (the later for ext4). > > UEFI (edk2) deals, internally, with UCS-2 strings. > > With ext4 (and I expect this is the same for other fs drivers), > conversion is attempted from utf-8. Here, if the "from utf-8" conversion > errors (not utf-8), the dir entry is skipped, meaning that not anything > on a fs read can be reached by the UEFI code. > > This has to be kept in mind when populating a msdos partition for > booting and for people wandering in a filesystem using the UEFI shell: > even if the fs is readable, perhaps not everything will be accessible.
Not a problem, none of our boot code is likely to use anything beyond ACSII-compatible code points, and for the foreseeable future we'll be using the FAT-formatted ESP, where the long file name support is supposed to be UCS-2 anyway (that is, not UTF-16). If you need multi-astral-plane-codepoint Unicode emoji to boot an OS you're doing something very wrong.