> Well, what I was wondering is if we need to load any > data at all (except english for compatibility).
I don't think we need to, it's just the way it was done. I didn't consider it until you brought it up, but it would make sense to leave the format alone and load everything at runtime depending on what's available. Reading between the lines of luatex-hyphen.pdf, I'm guessing the authors were envisaging a transition period where installations both with and without the patterns in the format could exist; that was at a time when packages where actively developed for LuaTeX (luatex-hyphen was started in 2010). This didn't happen; when I ported Polyglossia to LuaTeX two years later I didn't consider the case where patterns available were the format and expected to have to load them on-demand anyway. I suppose you did the same for Babel. > Furthermore, > I think this improves compatibility, because we never > know how the format was built (and also think of local > patterns). I was thinking along those lines too: having metadata in the format makes the system *less* portable since we then need to have at runtime the exact data the metadata was referring to. > This also open hyphenation to packages (patterns > based on rules are easy to program even in TeX, as mkpattern > does, for example). That's certainly useful for some applications. >> http://tug.org/TUGboat/tb29-3/tb93miklavec.pdf > > Well, but it's not CTAN :-). It is quite easy to reach from CTAN if you look for documentation. The top-level README for hyph-utf8 mentions the hyphenation page on tug.org, that has a link to the TUGboat article. This of course demands a small amount of research, and a little bit of navigation, but that's always going to be the case for any package -- although a tiny improvement is now possible: CTAN has very recently started allowing package writers to put their README in Markdown format in order to display it directly on the package page. Until now there was only a link to it from http://www.ctan.org/pkg/hyph-utf8 -- I've now converted our README to Markdown in order to remove one level of indirection. More generally, I think that this kind of information is best written up as an article and published in a place such as TUGboat, especially considering the very low level of awareness of BCP 47. This is really a pity because it's a really useful standard, and particularly well suited to identifying languages in the TeX world -- so yes, I'd much rather have people ask questions about it on mailing lists in order to spread the word :-) Best, Arthur
