Welel an alternative to XML is JSON which is more compact and faster/simpler to process; however JSON has no explicit schema, unless the schema is being made part of the data itself, complicating its structure (with many levels of arrays of arrays, in which case it becomes less easy to read by humans, but more adapted to automated processes for fast processing).
I'd say that the XML alone is enough to generate any JSON-derived dataset that will conform to the schema an application expects to process fast (and with just the data it can process, excluding various extensions still not implemetned). But the fastest implementations are also based on data tables encoded in code (such as DLL or Java classes), or custom database formats (such as Berkeley dB) generated also automatically from the XML, without the processing cost of decompression schemes and parsers. Still today, even if XML is not the usual format used by applications, it is still the most interoperable format that allows building all sorts of applications in all sorts of languages: the cost of parsing is left to an application builder/compiler. Some apps embed the compilers themselves and use a stored cache for faster processing: this approach allows easy updates by detecting changes in the XML source, and then downloading them. But in CLDR such updates are generally not automated : the general scheme evolves over time and there are complex dependencies to check so that some data becomes usable (frequently you need to implement some new algorithms to follow the processing rules documented in CLDR, or to use data not completely validated, or to allow aplicatioçns to provide their overrides from insufficiently complete datasets in CLDR, even if CLDR provides a root locale and applcaitions are supposed to follow the BCP47 fallback resolution rules; applciations also have their own need about which language codes they use or need, and CLDR provides many locales that many applications are still not prepared to render correctly, and many application users complain if an application is partly translated and contains too many fallbacks to another language, or worse to another script). Le jeu. 30 août 2018 à 20:38, Doug Ewell via Unicode <[email protected]> a écrit : > UnicodeData.txt was devised long before any of the other UCD data files. > Though it might seem like a simple enhancement to us, adding a header > block, or even a single line, would break a lot of existing processes that > were built long ago to parse this file. > > So Unicode can't add a header to this file, and that is the reason the > format can never be changed (e.g. with more columns). That is why new files > keep getting created instead. > > The XML format could indeed be expanded with more attributes and more > subsections. Any process that can parse XML can handle unknown stuff like > this without misinterpreting the stuff it does know. > > That's why the only two reasonable options for getting UCD data are to > read all the tab- and semicolon-delimited files, and be ready for new > files, or just read the XML. Asking for changes to existing UCD file > formats is kind of a non-starter, given these two alternatives. > > > > -- > Doug Ewell | Thornton, CO, US | ewellic.org > > -------- Original message -------- > Message: 3 > Date: Thu, 30 Aug 2018 02:27:33 +0200 (CEST) > From: Marcel Schneider via Unicode <[email protected]> > > Curiously, UnicodeData.txt is lacking the header line. That makes it > unflexible. > I never wondered why the header line is missing, probably because compared > to the other UCD files, the file looks really odd without a file header > showing > at least the version number and datestamp. It?s like the file was made up > for > dumb parsers unable to handle comment delimiters, and never to be upgraded > to do so. > > But I like the format, and that?s why at some point I submitted feedback > asking > for an extension. [...] > >

