From: "Doug Ewell" <[EMAIL PROTECTED]> > Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote: > > Shamely, > I wish I knew which real English word you mean by this. "Shamefully"? > "Sadly"? "Unfortunately"? "Embarrassingly"?
I know that I use this word instead of "unfortunately". I don't know where I learnt it, but I use it frequently... > > the idea of "block-level" and "inline" elements is specific to HTML, > > but HTML today is an application of XML, and the problem must be > > solved at the XML level. > > HTML is not an application of XML. HTML and XML are both applications > of SGML. XHTML, which I use and recommend, is an application of HTML > *to* XML. You did not need to specify this. I said "TODAY" which means the *current* standard version of HTML, which is now XHTML, i.e. really an application of XML (the legacy syntax with unclosed elements and unquoted attribute values, allowed in HTML and SGML, is being deprecated as it is forbidden in XML)... What I mean here is that a solution to disambiguate the grapheme cluster boundaries that collides during normalization with the ?ML lexical analysis, but that will work with the restricted XML syntax, will then work with XHTML, HTML4 or lower, or even with SGML, which is the ancestor of the family. It's a place where the W3C (for XML, XHTML and HTML4 or lower) and the SGML consortium can make recommandations. Of course there's the Unicode Technical Report #20 that speaks about the case of XML. For Unicode, it is informative, the most important thing is that this document is co-signed by the W3C, on 13 June 2003, and so is now an appropriate (but incomplete) response of the W3C for this problem. UTR#20 does not completely cover the subject, as there's still nothing with the change in Unicode 4.0.1, related to the use of ZW(J)J in rule D17 and related... May be Martin DÃrst of the W3C should look precisely of the effect of D17 and if UTR#20 should not be updated... I don't know if there's some similar recommandation from the SGML consortium. There may also exist similar problems in other languages or protocols using Unicode and which are possibly exposed now to this change which may break their existing syntax. In some of these cases, the solution with NCRs will not be so easy to find, and these other protocols or languages using Unicode may need to apply further restrictions about what they consider as "valid Unicode strings", or may simply choose to NOT apply the D17 change (so that a string containing only a ZW(N)J character will still be valid and won't collide with the language syntax).

