On Sat, 15 Feb 2014 19:39:59 +0100 "Janusz S. Bien" <jsb...@mimuw.edu.pl> wrote:
> Quote/Cytat - Richard Wordingham <richard.wording...@ntlworld.com> > (Sat 15 Feb 2014 07:25:51 PM CET): > > Each precomposed character adds a small processing > > overhead to an extremely large number of computers, not just to the > > computers that actually use it. > This is a very strong claim. Would be so kind to elaborate? The following need to be stored simply because the character has been assigned: name (typically for character pick-lists) script (typically for breaking text runs by script) casing (upper/lower/titlecase) collation properties (not strictly necessary) There are many other properties, but many of them will often be covered by default rules and may not need to be stored explicitly. The only likely subsetting options I can think of would be to not support the supplementary planes or to not support CJK characters. This data will be moved when an operating system is installed, and the files are liable to be moved or replaced at other times. I will concede that it is possible that this information may not need to be moved from disk to memory - the data is likely to be ordered by codepoint and if nearby codepoints are never used either it will not need to be loaded. Some data files are mapped to memory, but I unfortunately I can't comment on the processing overhead of increasing their size if the additional data is not accessed. The operations that will be most significantly be affected is composition. I am assuming that composition information will be used even in the presence of a composition exclusion, e.g. to select the best glyph from a font. (One could optimise this away by potentially rendering the canonical decomposition of a precomposed character differently to the precomposed character.) The composition data, consisting of the pairs of characters to which precomposed characters decompose, will be stored in codepoint order of the decomposition. The net effect of this is that the existence of unused composition data will increase the number of cache misses, and thus increase the amount of processing required. If there is not a separate store of compositions not subject to composition exclusion, then the same effect will occur whenever a composition happens as part of the transform of a character string to NFC or NFKC, e.g. in the processing of a non-ASCII internet domain name. If data access is not carefully optimised, there will be many more occasions when unused decompositions will nevertheless add to the processing load. Richard. _______________________________________________ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode