Re: comments on XHTML Modularization 1.1 from XML Schema WG

Steven Pemberton Wed, 22 Aug 2007 05:21:13 -0700


Dear Michael, and other colleagues,


Thank you for your belated last call comments on XHTML Modularization 1.1.
 http://lists.w3.org/Archives/Public/www-html-editor/2007JanMar/0035

To return the favour, here is our belated reply :-)/2 (largely caused byour rechartering, which happened after your comments arrived).


2.1. Charset type

    Charset is defined as a vacuous restriction of xsd:string. That may
    be the right thing to do, but it seems likely that a better
    definition can be formulated.
[...]
    A more ambitous definition might mention all of the values in the
    IANA type registry, but the result, when examined, is rather long
    and not really very informative — rather like the registry itself
    — and it is not included here.

While we agree on the principle of validating as much as possible, we arewary of duplicating someone else's list in a specification: we run therisk of making the schema brittle, and needing to be regularly updated.


2.2. Color type

    Two things seem puzzling in the current definition of Color: (1) it
    allows any NMTOKEN, rather than just the sixteen well known color
    names. And (2) while six-digit hexadecimal values are allowed,
    three-digit values are not allowed. (The description of Color in
    HTML 4.01 (<URL:[40]http://www.w3.org/TR/html401/types.html#h-6.5>)
    doesn't actually specify how many digits are to be used for hex
    color values.)

Three digit hex colour values were introduced in CSS, and are not actuallya part of HTML; in fact we agree that the HTML definition is a littleunclear, and only seems to suggest what the correct values are throughexamples. The problem is, with legacy content now on the web, it isdifficult to say whether colour "#FAB" should be interpreted as "#FFAABB"as it is in CSS, or "#000FAB" as would be suggested if you interpret thevalue as "a hexadecimal number" which is what the specification says itis. Since the 6 digit version is the only likely interoperable one, weprefer to keep it at that. As for the sixteen well-known values, whilethese are defined in the HTML4 specification, many other values are nowextant and interoperable on the web (and remember that Modularization isfor a whole family of languages, not just HTML4 derivatives).


2.3. ContentType

    Like Charset, this could be defined as a union whose first member(s)
    recognize well-known values defined by the RFCs or in the IANA
    registry and whose final type (here xsd:string) takes care of
    extensibility. It's not clear to me whether the values are in fact
    limited by the RFC to ASCII characters; if so, xsd:string is a bit
    too broad.

We are considering this change for a future revision.

2.4. Coords type

    Since the possible values of Coords values are so clearly specified
    in the spec, it seems a shame not to define the type a little more
    tightly.

This seems like a reasonable suggestion.

2.5. FPI type
[...]
    The pattern is then quite simple:

  <xsd:simpleType name="FPI">
   <xsd:restriction base="xsd:normalizedString">
    <xsd:pattern value="&fpi;"/>
   </xsd:restriction>
  </xsd:simpleType>

Looks good.

2.6. FrameTarget type

    The HTML spec
    (<URL:[43]http://www.w3.org/TR/html401/types.html#h-6.16>) seems to
    want a slightly tighter definition of frame target names. Perhaps
    something like the following should be used.

Good idea

2.7. LinkTypes type

    LinkTypes is a good example of a type with what is sometimes called
    a ‘semi-open’ list of values. Some set of well-known values is
    defined, which software is encouraged to recognize and which authors
    are encouraged to use when appropriate, but for strict validity, a
    much larger set of values is allowed.

    In such cases, it's good practice to document the recognized types
    in the type definition. Since the well known values here are case
    insensitive, that's best done with a list of patterns rather than
    with an enumeration:

Frankly this looks rather like overkill to us. These values are intendedonly to be an initial set, and many more to be used, so we don't reallysee the value-add of including these few in the schema (especially sinceit is not really readable).


2.8. Tightening other types

In general we agree that closed sets of values should be more tightlydefined; we are not so enamoured of defining values of open sets, sincethere is no validation win.


2.9. Named model groups vs. substitution groups

    We reiterate our advice of four years ago: the definition of the
    XHTML vocabulary would be easier to follow, and it would be easier
    to extend it, if the schema documents used substitution groups
    wherever feasible.

    If you have had specific problems applying substitution groups to
    XHTML, we would very much like to know what they were; we can
    speculate, but would prefer to hear from you.

The people who produced the schema felt that the approach used here to bethe most consistent with Modularization in general, and the one mostlikely to work. However, we take your advice seriously, and would like toadopt this. However, in order to allow modularization to proceed withouttoo much more delay, we will not adopt this (rather drastic) change inthis version, but save it for the planned version 2.


2.10. Adding attributes

    It's not clear that the way modules add attributes works. For
    example, the client side image map module adds attributes to the img
    element. All well and good, but looking at the schema I see an
    attribute group defined:

   <!-- modify img attribute definition list -->
      <xs:attributeGroup name="xhtml.img.csim.attlist">
          <xs:attribute name="usemap" type="xs:IDREF"/>
      </xs:attributeGroup>

    I can't see where this actually is used anywhere in the schema. I
    think what the module should be doing is a redefine of the groups.

The extension mechanisms get used in the 'drivers' which define a languageon the basis of the modules. There is no driver supplied withmodularization; you need to look at a particular language's use ofModularization to see these in use.


2.11. A missing scenario

    One important scenario that seems to be missing is just plonking
    bits of the XHTML namespace into specific places in some other
    namespace. Maybe its too obvious/easy, but it is actually the most
    common scenario. e.g. MyOwnLanguage has its own things, and I'll
    just put some XHTML inline elements here.

    Introducing XHTML elements into the xsd:documentation elements in a
    schema document is another instance of the scenario.

We have a concept of 'integration sets' which allow this usage. What wewill do is add an example to the spec to show how to do this, to make itclearer.


3.1. Make the introduction less DTD-specific

This should be much better now.

3.2. The term PCDATA

fixed.

3.5. Shape type

    Shouldn't the overview in section 4.3 say that Shape has just the
    four values rect, circle, ply, and default?

Yes, it should, and will.

3.6. White space in the document source

Thanks. We will do a clean up prior to publication.

4.1. Testing the schema documents
[...]
    [Later information from Shane McCarron is that this spec doesn't
    provide a driver, but that
    <URL:[52]http://www.w3.org/MarkUp/SCHEMA/xhtml11.xsd> might be
    consulted as an example. To be followed up ...)

Indeed, the Modularization spec doesn't include any drivers. We have addedan informative link to one.


4.2. Where is the html element?

    Where is the html element defined?

It is in the structure module.

   (And, for the instruction of those seeking to understand
    how to use these modules, a pointer to the XHTML 1.1 driver modules
    would be very useful.

Done.

    But the issue appears to at least some readers as at least partly
    substantive: that is, it seems to us that a specification describing
    a modular definition of the XHTML 1.1 vocabulary ought, in the
    nature of things, to include a top-level driver module which calls
    in all the others.

Coming from a group that didn't include a mechanism to specify what theroot element is, I am shocked!But seriously, this is modularization 1.1, not the modularization of XHTML1.1. Modularization 1.1 is and will be used by many different languages.(See for instance

   http://www.w3.org/MarkUp/Group/2007/xhtml-modularization-11-implementation
)

4.3. Case insensitivity and XML Schema patterns or enumerations
[...]
    Given that many regex libraries already have such flags, such an
    addition wouldn't seem to be difficult for implementors.
    Should the XML Schema Working Group consider such a change?

It would make certain declarations easier to write, and make them actuallyreadable.


    And if so, what is to be done about Unicode characters for which the
    upper/lowercase mapping is not 1:1? And what should be done about
    title case?

Ha! You're asking the wrong people...

Thanks for the comments.

Best wishes,

Steven Pemberton
For the XHTML2 Working Group

Re: comments on XHTML Modularization 1.1 from XML Schema WG

Reply via email to