[webkit-dev] Fwd: Fwd: Fwd: HTML5 & MathML3 entities

Alexey Proskuryakov Fri, 17 Sep 2010 15:35:56 -0700


Начало переадресованного сообщения:


> От: David Carlisle <[email protected]>
> Дата: 17 сентября 2010 г. 15:32:45 Тихоокеанское летнее время
> Кому: Alexey Proskuryakov <[email protected]>
> Тема: Ответ: [webkit-dev] Fwd: Fwd: HTML5 & MathML3 entities
> 
> On 17/09/2010 22:57, Alexey Proskuryakov wrote:
>> 
>> 17.09.2010, в 14:28, David Carlisle написал(а):
>> 
>>> No the code point is in the math symbols block and was always
>>> intended for math usage. Some time after the code point was added
>>> (I think, I don't have the data to hand) it got added a canonical
>>> mapping to to 3xxx block, that was an error that the unicode
>>> consortium is now trying to correct (or at least back when unicode
>>> 3.x added this new character)
>> 
>> I cannot follow this argument. My understanding is that adding a
>> single character canonical decomposition implies deprecation in
>> Unicode, so describing this as a two-step process confuses me.
> 
> adding a canonical decomposition doesn't imply deprecation.
> Depending on which canonical form is chosen, the canonicalisation mapping can 
> go either way, loosely speaking some forms prefer composite characters, some 
> use combining characters in preference (not that combining characters are 
> involved here)
> 
> 2329  was deprecated some years after the canonical mapping was added
> because it was realised that that mapping was wrong, but mappings are never 
> changed once added. It became deprecated not when the mapping to 3008 was 
> added; it became deprecated when it was replaced by 27E8
> I described it as a two step process because it happened in two stages.
> 
> 
> 
>> At the time I looked at this (and also currently) the deprecated
>> character had a canonical decomposition that made it equivalent to a
>> CJK character. Any software that treats this character as a math one
>> clearly violates many versions of the Unicode specs, including the
>> current one. It might have been conformant to Unicode 2.0 or some
>> earlier version though.
> 
> It was conformant to unicode 2 yes, the fact that unicode then added a 
> canonical form to 3xxx doesn't make them non conformant, systems don't have 
> to use NFC form and they don't have to use any particular glyph, so for 
> either reason it's perfectly conformant to use a math character for 2329.
> 
>> 
>>> the lang and rang entity names come from the ISO math entity to
>>> denote math angle brackets. These sets and these names predate
>>> Unicode and predate HTML, it's unfortunate that after the names
>>> were mapped to unicode a canonical mapping to a different character
>>> was added, but
>> 
>> I don't see how the origins of the debate change the fact that these
>> Unicode fonts you mentioned violated the Unicode spec. They may have
>> been doing "the right thing" or not, but arguing that they didn't
>> violate the letter of the spec seems strange.
> 
> I don't think they violate the spec at all. Except as far as the spec was 
> internally inconsistent once it had added a canonical mapping between two 
> separate characters.
>> 
>> Clearly, I have a different perspective, since I don't think that
>> things that pre-date HTML and Unicode should have much weight in
>> today's decisions.
> 
> The point is that there have been documents using those entities as math 
> character names in continuous use since the '80s why should they all be 
> broken? Not to mention the fact that the vast majority of use of those 
> entities in html will also be expecting a mathematical bracket (even if on 
> some systems, with some fonts the character glyph used was actually designed 
> for CJK punctuation).
> 
> In fact where classical ISO usage and HTML usage differed I followed HTML 
> usage in all cases (for all the obvious reasons) even when the HTML 
> definitions make no sense at all (eg asymp) but in this case
> external factors (ie Unicode moving the goalposts) meant that the "new" 
> Uniocde 3.2 character should be used here.
> 
>> 
>>> the only fix the UTC suggest for that is just not using 2329 at all
>>> and use 27E8 instead. Which is what the entity spec recommends.
>> 
>> 
>> Did they actually suggest to use it for the lang entity in HTML, or
>> did they suggest to use it when a math character is desired?
> 
> xhtml entities have document scope it is not possible for an xhtml+mathml 
> document to have different definitions for html and mathml use, but even for 
> pure html use it is fairly clear that 27e8 is the correct choice.
> 
> rang was never defined to be 3009, it was defined to be 232A  and documented 
> as being a math angle bracket. Unicode have deprecated 232A and suggest that 
> any uses of that be replaced by 27E9 because 232A is effectively unusable as 
> it is subject to an essentially accidental and incorrect normalisation to 
> 3009.
> 
> It would be bizarre in the extreme to redefine rang to be 3009 (is there any 
> evidence of anyone ever having used that entity name and wanting a CJK 
> character?) the choices are doing what Unicode has suggested (since Unicode 
> 3.2) and using 27E9 instead, or the alternative would be to declare that 
> changing the html entities is just too scary and to leave it as 232A  and 
> live with the fact that this will be inconsistently rendered, and violates 
> the w3c/unicode charmod normal form rules, and is directly against the 
> deprecation of this character in the Unicode specification. Of those two 
> choices, defining it to be 27E9 seems to be the lesser of two evils.
> 
> David
> 
>> 
>> - WBR, Alexey Proskuryakov
>> 
>> 

- WBR, Alexey Proskuryakov


_______________________________________________
webkit-dev mailing list
[email protected]
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

[webkit-dev] Fwd: Fwd: Fwd: HTML5 & MathML3 entities

Reply via email to