Начало переадресованного сообщения:
> От: David Carlisle <[email protected]> > Дата: 17 сентября 2010 г. 15:32:45 Тихоокеанское летнее время > Кому: Alexey Proskuryakov <[email protected]> > Тема: Ответ: [webkit-dev] Fwd: Fwd: HTML5 & MathML3 entities > > On 17/09/2010 22:57, Alexey Proskuryakov wrote: >> >> 17.09.2010, в 14:28, David Carlisle написал(а): >> >>> No the code point is in the math symbols block and was always >>> intended for math usage. Some time after the code point was added >>> (I think, I don't have the data to hand) it got added a canonical >>> mapping to to 3xxx block, that was an error that the unicode >>> consortium is now trying to correct (or at least back when unicode >>> 3.x added this new character) >> >> I cannot follow this argument. My understanding is that adding a >> single character canonical decomposition implies deprecation in >> Unicode, so describing this as a two-step process confuses me. > > adding a canonical decomposition doesn't imply deprecation. > Depending on which canonical form is chosen, the canonicalisation mapping can > go either way, loosely speaking some forms prefer composite characters, some > use combining characters in preference (not that combining characters are > involved here) > > 2329 was deprecated some years after the canonical mapping was added > because it was realised that that mapping was wrong, but mappings are never > changed once added. It became deprecated not when the mapping to 3008 was > added; it became deprecated when it was replaced by 27E8 > I described it as a two step process because it happened in two stages. > > > >> At the time I looked at this (and also currently) the deprecated >> character had a canonical decomposition that made it equivalent to a >> CJK character. Any software that treats this character as a math one >> clearly violates many versions of the Unicode specs, including the >> current one. It might have been conformant to Unicode 2.0 or some >> earlier version though. > > It was conformant to unicode 2 yes, the fact that unicode then added a > canonical form to 3xxx doesn't make them non conformant, systems don't have > to use NFC form and they don't have to use any particular glyph, so for > either reason it's perfectly conformant to use a math character for 2329. > >> >>> the lang and rang entity names come from the ISO math entity to >>> denote math angle brackets. These sets and these names predate >>> Unicode and predate HTML, it's unfortunate that after the names >>> were mapped to unicode a canonical mapping to a different character >>> was added, but >> >> I don't see how the origins of the debate change the fact that these >> Unicode fonts you mentioned violated the Unicode spec. They may have >> been doing "the right thing" or not, but arguing that they didn't >> violate the letter of the spec seems strange. > > I don't think they violate the spec at all. Except as far as the spec was > internally inconsistent once it had added a canonical mapping between two > separate characters. >> >> Clearly, I have a different perspective, since I don't think that >> things that pre-date HTML and Unicode should have much weight in >> today's decisions. > > The point is that there have been documents using those entities as math > character names in continuous use since the '80s why should they all be > broken? Not to mention the fact that the vast majority of use of those > entities in html will also be expecting a mathematical bracket (even if on > some systems, with some fonts the character glyph used was actually designed > for CJK punctuation). > > In fact where classical ISO usage and HTML usage differed I followed HTML > usage in all cases (for all the obvious reasons) even when the HTML > definitions make no sense at all (eg asymp) but in this case > external factors (ie Unicode moving the goalposts) meant that the "new" > Uniocde 3.2 character should be used here. > >> >>> the only fix the UTC suggest for that is just not using 2329 at all >>> and use 27E8 instead. Which is what the entity spec recommends. >> >> >> Did they actually suggest to use it for the lang entity in HTML, or >> did they suggest to use it when a math character is desired? > > xhtml entities have document scope it is not possible for an xhtml+mathml > document to have different definitions for html and mathml use, but even for > pure html use it is fairly clear that 27e8 is the correct choice. > > rang was never defined to be 3009, it was defined to be 232A and documented > as being a math angle bracket. Unicode have deprecated 232A and suggest that > any uses of that be replaced by 27E9 because 232A is effectively unusable as > it is subject to an essentially accidental and incorrect normalisation to > 3009. > > It would be bizarre in the extreme to redefine rang to be 3009 (is there any > evidence of anyone ever having used that entity name and wanting a CJK > character?) the choices are doing what Unicode has suggested (since Unicode > 3.2) and using 27E9 instead, or the alternative would be to declare that > changing the html entities is just too scary and to leave it as 232A and > live with the fact that this will be inconsistently rendered, and violates > the w3c/unicode charmod normal form rules, and is directly against the > deprecation of this character in the Unicode specification. Of those two > choices, defining it to be 27E9 seems to be the lesser of two evils. > > David > >> >> - WBR, Alexey Proskuryakov >> >> - WBR, Alexey Proskuryakov _______________________________________________ webkit-dev mailing list [email protected] http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

