Начало переадресованного сообщения:
> От: David Carlisle <[email protected]> > Дата: 17 сентября 2010 г. 18:02:39 Тихоокеанское летнее время > Кому: Alexey Proskuryakov <[email protected]> > Тема: Ответ: [webkit-dev] Fwd: Fwd: HTML5 & MathML3 entities > > On 18/09/2010 00:05, Alexey Proskuryakov wrote: >> >> 17.09.2010, в 15:32, David Carlisle написал(а): >> >>> adding a canonical decomposition doesn't imply deprecation. >>> Depending on which canonical form is chosen, the canonicalisation >>> mapping can go either way, loosely speaking some forms prefer >>> composite characters, some use combining characters in preference >>> (not that combining characters are involved here) >> >> This is not accurate. For singleton decomposition, both NFC and NFD >> contain the decomposed form. See Unicode 5.2.0 section D113 (full >> composition exclusion) for details. > > yes NFC and NFD are the same in these cases, but that doesn[t really change > the main point that deprecation here is nothing to do with the character > having a different normal form. Compare > ANGSTROM SIGN (212B) and > LATIN CAPITAL LETTER A WITH RING ABOVE (00C5) > these are similarly related by canonical form and so clearly C5 is preferred > but 212B is not deprecated in the same way as 2329 is. > see the entry for 212B in > > http://www.unicode.org/charts/PDF/U2100.pdf > > 2329 is deprecated because it is replaced by 27E8 not because it > maps to something else in NFC. > >> >>> 2329 was deprecated some years after the canonical mapping was >>> added because it was realised that that mapping was wrong, but >>> mappings are never changed once added. It became deprecated not >>> when the mapping to 3008 was added; it became deprecated when it >>> was replaced by 27E8 I described it as a two step process because >>> it happened in two stages. >> >> Because of the above, I don't see how it could happen in two stages. >> Adding a singleton decomposition logically implies deprecation. And >> it wasn't until Unicode 5.2 that "deprecated" had a clearly defined >> meaning anyway. > If that were the meaning of deprecated in this case, the deprecated character > would be deprecated in favour of its canonically equivalent character but > that isn't the case. It is deprecated _because_ that incorrect decomposition > exists, and is deprecated in favour of a new character added specifically to > avoid the problem. >> >>> It was conformant to unicode 2 yes, the fact that unicode then >>> added a canonical form to 3xxx doesn't make them non conformant, >>> systems don't have to use NFC form and they don't have to use any >>> particular glyph, so for either reason it's perfectly conformant to >>> use a math character for 2329. >> >> Again, both composition and decomposition of U+2329 produces U+3008. > Yes but a system isn't obliged to compose or decompose (and most do not > automatically in my experience) >> >>> The point is that there have been documents using those entities as >>> math character names in continuous use since the '80s why should >>> they all be broken? Not to mention the fact that the vast majority >>> of use of those entities in html will also be expecting a >>> mathematical bracket (even if on some systems, with some fonts the >>> character glyph used was actually designed for CJK punctuation). >>> >>> In fact where classical ISO usage and HTML usage differed I >>> followed HTML usage in all cases (for all the obvious reasons) even >>> when the HTML definitions make no sense at all (eg asymp) but in >>> this case external factors (ie Unicode moving the goalposts) meant >>> that the "new" Uniocde 3.2 character should be used here. >> >> Do these documents use the entities with the same "&...;" notation? > > yes, of course. > >> MathML didn't exist in the 80's, so what are the documents that >> actually conflict with HTML, or with compound XHTML documents? > > Well the point of breaking the mathml (and html) entities into a separate > spec was to get a uniform set of definitions across different uses. If (as > was the case) the same entity name (used via the same syntax) means different > things in docbook, mathml and html, then formally you may argue that > everything is OK and consistent, each document obeys its own language > definition, but in practice moving fragments between documents results in > silent data corruption. > the entity spec was separated out from the mathml spec in 2003 and went > through numerous public revisions, people in the old and the new HTML groups > were asked to commnt on it, people in the UTC/Unicode list and people on the > original ISO working groups who defined the entitiy names originally, after 7 > years of open review it went to REC earlier this year (and MathML3 depending > on it will hopefully go to REC this month) > > >>>>> the only fix the UTC suggest for that is just not using 2329 at >>>>> all and use 27E8 instead. Which is what the entity spec >>>>> recommends. >>>> >>>> >>>> Did they actually suggest to use it for the lang entity in HTML, >>>> or did they suggest to use it when a math character is desired? > > the comments were in relation to the entities draft which has the explicit > intention of being a common set of definitions for any uses of these entity > names. > >>> xhtml entities have document scope it is not possible for an >>> xhtml+mathml document to have different definitions for html and >>> mathml use, but even for pure html use it is fairly clear that 27e8 >>> is the correct choice. >> >> I wasn't asking about HTML vs. XHTML - both used to define⟩ in >> the same way. > > The same way as MathML2, actually. This change isn't about matching XHTML or > MathML2, it's about tracking changes to Unicode. > > I can re-phrase my question as "Did they actually >> suggest to use it for the lang entity in (X)HTML, or did they suggest >> to use it when a math character is desired?" >> > >> >> I don't think that characterizing what we did in WebKit as bizarre in >> the extreme is fair. > > fair or not, I think it is was clearly the wrong thing to do (even if well > intentioned) nothing in HTML or XHTML specifications would licence such a > definition. You could claim perhaps that you were using HTML followed by NFC > normalisation, but that's a very weak argument I think. > > The Unicode spec (or at least the code chart page at > http://www.unicode.org/charts/PDF/U2300.pdf which is what I have to hand) > doesn't say it is deprecated in favour of 3009 it says that it is deprecated > _because_ of the equivalence to CJK punctuation and that mathematical use is > strongly recommended to use 27e8 instead. > > It is very hard to think that anyone using CJK characters (and so presumably > with access to some convenient keyboarding scheme for those code ranges) > suddenly requires an ascii entity name reference to access a punctuation > character. Conversely mathematical usage habitually uses long ascii names for > characters, It is clear that rang and lang have always been intended as > mathematical characters, and I ask again whether you really think that > (barring artificial test cases) anyone writing in CJK languages uses these > english ascii entity references for just those two characters? I don't see > how it is possible to read Uniocde as saying anything other than rang ought > to point to 27e8 > > Unicode techical report 25 says > > Unicode 3.2 added two new mathematical angle bracket characters ⟨ ⟩ (U+27E8 > and U+27E9) that are unequivocally intended for mathematical use and should > be used instead of U+2329 and U+232A. > > > > David > > - WBR, Alexey Proskuryakov _______________________________________________ webkit-dev mailing list [email protected] http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

