Re: [whatwg] id and xml:id

2007-06-08 Thread Ian Hickson
On Sun, 2 Apr 2006, Henri Sivonen wrote:
>
> Since UAs handle whitespace in the id attribute inconsistently (see 
> below)

Note that there is interoperability (in that, we have two browsers that do 
the same thing, and one of those is IE, even).


> old specs imply or require whitespace trimming

Old specs imply or require a lot of things. ;-)


> and ids with whitespace are unreferencable from whitespace-separated 
> lists of ids,

True.


> I suggest adding the following language concerning document conformance:
> 
> The value of the id attribute must be a string that consists of one or 
> more characters matching the following production: 
> [#x21-#xD7FF]|[#xE000-#xFFFD]|[#x1-#x10] (any XML 1.0 character 
> excluding whitespace).

I've made it non-conforming for an ID to contain a whitespace character.


> Also, I suggest requiring that elements must not have both id and xml:id 
> and requiring that xml:id must not occur in the HTML serialization. 
> (Again, from the document conformance point of view--not disputing 
> requirements on browsers.)

I don't really want to mention xml:id. If someone wants to write a spec 
that affects our spec, that's their business. I don't think it makes sense 
for us to go ahead and then ban their spec. That's not to say that xml:id 
is good or bad, it just doesn't seem relevant to mention it in our spec.


> If an element had both an id attribute and an xml:id attribute with different
> values, the document would not be HTML-serializable, which would be bad.

That applies to any document that has nodes from other namespaces. xml:id 
isn't special in that sense.


> If an element was allowed to have an id attribute and an xml:id attribute with
> the same value, the following constraint from xml:id spec would be violated
> even for conforming docs:
> "An xml:id processor should assure that the following constraint holds:
>* The values of all attributes of type “ID” (which includes all xml:id
> attributes) within a document are unique."
> ( http://www.w3.org/TR/xml-id/ )

I don't really understand what you mean there.


> Finally, as the ultimate ID nitpicking, the spec should state that it is 
> naughty of authors to turn attributes other than id and xml:id into IDs 
> via the DTD. (Well, using a DTD at all is naughty. :-)

Again, if they want to do that, that's their business. I don't see that as 
a big problem.


> Test case: http://hsivonen.iki.fi/test/wa10/adhoc/id.html
> The script tries every id with a whitespaceless value to see if whitespace is
> trimmed before ID assignment.
>
> Safari and IE 6:
> 
> id='a' PASS
> id='2' PASS
> id='<' PASS
> id=',' PASS
> id='ä' PASS
> id=' c ' FAIL
> id='\nd\n' FAIL
> id='\t\te\t\t' FAIL
> id='
f
' FAIL

That's what the spec requires today.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] id and xml:id

2006-04-12 Thread Robin Berjon

On Apr 12, 2006, at 14:25, Henri Sivonen wrote:

On Apr 4, 2006, at 19:35, fantasai wrote:
That seems odd. You should be able to say "the content model of  
this element

is anything".
http://books.xmlschemata.org/relaxng/relax-CHP-12- 
SECT-2.html#relax-CHP-12-SECT-2.1


From the spec:
"Thus, a RELAX NG schema that is compatible with this feature  
implies a mapping from element/attribute name pairs onto an ID- 
type, and hence a mapping from attributes in the instance onto ID- 
types."

( http://relaxng.org/compatibility-20011203.html )

"Any attribute" with ID-type null on "any element" competes with  
attribute id with ID-type ID on element foo. That's why the  
attributes with non-null ID-type would paired with known elements  
need to be subtracted from the "any" set.


Yes, that's is IMHO the single biggest PITA with RNG. I wish that  
restriction were removed but it seems unlikely. For the SVG Tiny 1.2  
RNG I worked around this by having a schema that never mentions the  
fact that you can have anything from foreign namespaces and using  
NVDL to express the way in which vocabularies could be mixed. You can  
look at it at http://www.w3.org/TR/SVGMobile12/ 
conform.html#ConformingSVGDocuments.


I think that a conformance checker / schema should forbid element  
children of 

Re: [whatwg] id and xml:id

2006-04-12 Thread Henri Sivonen

On Apr 4, 2006, at 19:35, fantasai wrote:


Henri Sivonen wrote:


I have now assessed the damage. It is not as bad as it looked  
like. :-)

Despite a flood of error messages, there were only three causes:
1) Can't have wild card attributes on wild card elements in the  
wild  card content models of the script and style elements. (Not a  
big  deal. It is reasonable to restrict them to known style and  
script  languages.)


That seems odd. You should be able to say "the content model of  
this element

is anything".
http://books.xmlschemata.org/relaxng/relax-CHP-12-SECT-2.html#relax- 
CHP-12-SECT-2.1


From the spec:
"Thus, a RELAX NG schema that is compatible with this feature implies  
a mapping from element/attribute name pairs onto an ID-type, and  
hence a mapping from attributes in the instance onto ID-types."

( http://relaxng.org/compatibility-20011203.html )

"Any attribute" with ID-type null on "any element" competes with  
attribute id with ID-type ID on element foo. That's why the  
attributes with non-null ID-type would paired with known elements  
need to be subtracted from the "any" set.


Anyway to get better on topic for this list:

I think that a conformance checker / schema should forbid element  
children of 

Re: [whatwg] id and xml:id

2006-04-12 Thread Henri Sivonen

On Apr 2, 2006, at 15:09, Anne van Kesteren wrote:


Quoting Henri Sivonen <[EMAIL PROTECTED]>:

Also, I suggest requiring that elements must not have both id and
xml:id and requiring that xml:id must not occur in the HTML
serialization. (Again, from the document conformance point of view--
not disputing requirements on browsers.)


How could it occur in a HTML document?


I meant having  in the serialization.


Finally, as the ultimate ID nitpicking, the spec should state that it
 is naughty of authors to turn attributes other than id and xml:id
into IDs via the DTD. (Well, using a DTD at all is naughty. :-)


But through DOM methods is ok?


I guess if such DOM functionality is interoperable.


Test case: http://hsivonen.iki.fi/test/wa10/adhoc/id.html



Do you have a similar test for xml:id?


I now have:
http://hsivonen.iki.fi/test/wa10/adhoc/xml-id.xhtml
and id in XHTML:
http://hsivonen.iki.fi/test/wa10/adhoc/id.xhtml

The results are unexpected and interesting.

Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-us; rv:1.9a1)  
Gecko/20060411 Firefox/3.0a1


http://hsivonen.iki.fi/test/wa10/adhoc/id.html

id='a' PASS
id='2' PASS
id='<' PASS
id=',' PASS
id='ä' PASS
id=' c ' FAIL
id='\nd\n' PASS
id='\t\te\t\t' PASS
id='
f
' PASS

http://hsivonen.iki.fi/test/wa10/adhoc/id.xhtml

id='a' PASS
id='2' PASS
id='<' PASS
id=',' PASS
id='ä' PASS
id=' c ' FAIL
id='\nd\n' FAIL
id='\t\te\t\t' FAIL
id='
f
' FAIL

http://hsivonen.iki.fi/test/wa10/adhoc/xml-id.xhtml

xml:id='a' FAIL
xml:id='2' FAIL
xml:id='<' FAIL
xml:id=',' FAIL
xml:id='ä' FAIL
xml:id=' c ' FAIL
xml:id='\nd\n' FAIL
xml:id='\t\te\t\t' FAIL
xml:id='
f
' FAIL

Opera 9 build 3312 (OS X)

http://hsivonen.iki.fi/test/wa10/adhoc/id.html

id='a' PASS
id='2' PASS
id='<' PASS
id=',' PASS
id='ä' PASS
id=' c ' FAIL
id='\nd\n' PASS
id='\t\te\t\t' PASS
id='
f
' FAIL

http://hsivonen.iki.fi/test/wa10/adhoc/id.xhtml

id='a' PASS
id='2' PASS
id='<' PASS
id=',' PASS
id='ä' PASS
id=' c ' FAIL
id='\nd\n' FAIL
id='\t\te\t\t' FAIL
id='
f
' PASS

http://hsivonen.iki.fi/test/wa10/adhoc/xml-id.xhtml

xml:id='a' PASS
xml:id='2' PASS
xml:id='<' PASS
xml:id=',' PASS
xml:id='ä' PASS
xml:id=' c ' PASS
xml:id='\nd\n' PASS
xml:id='\t\te\t\t' PASS
xml:id='
f
' PASS

WebKit-SVN-r13820

http://hsivonen.iki.fi/test/wa10/adhoc/id.html

id='a' PASS
id='2' PASS
id='<' PASS
id=',' PASS
id='ä' PASS
id=' c ' FAIL
id='\nd\n' FAIL
id='\t\te\t\t' FAIL
id='
f
' FAIL

http://hsivonen.iki.fi/test/wa10/adhoc/id.xhtml

id='a' PASS
id='2' PASS
id='<' PASS
id=',' PASS
id='ä' PASS
id=' c ' FAIL
id='\nd\n' FAIL
id='\t\te\t\t' FAIL
id='
f
' FAIL

http://hsivonen.iki.fi/test/wa10/adhoc/xml-id.xhtml

xml:id='a' FAIL
xml:id='2' FAIL
xml:id='<' FAIL
xml:id=',' FAIL
xml:id='ä' FAIL
xml:id=' c ' FAIL
xml:id='\nd\n' FAIL
xml:id='\t\te\t\t' FAIL
xml:id='
f
' FAIL

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] id and xml:id

2006-04-04 Thread fantasai

Henri Sivonen wrote:


I have now assessed the damage. It is not as bad as it looked like. :-)

Despite a flood of error messages, there were only three causes:
1) Can't have wild card attributes on wild card elements in the wild  
card content models of the script and style elements. (Not a big  deal. 
It is reasonable to restrict them to known style and script  languages.)


That seems odd. You should be able to say "the content model of this element
is anything".
http://books.xmlschemata.org/relaxng/relax-CHP-12-SECT-2.html#relax-CHP-12-SECT-2.1

2) Jing complains about the IDREFness altering co-occurrence  constraint 
between valuetype and value on the param element.

>
3) It appears that in RELAX NG an attribute can't be allowed to take  
the empty string if the attribute has the IDREFS nature. This is a  
problem with the form attribute.

See: http://groups.yahoo.com/group/rng-users/message/422


Does moving the choice up higher help any?

~fantasai


Re: [whatwg] id and xml:id

2006-04-04 Thread Henri Sivonen

On Apr 3, 2006, at 18:37, Henri Sivonen wrote:

I spent quite a while today verifying (by implementing a more  
permissive ID datatype library) that James Clark's Jing agrees with  
my reading of the spec.


In case anyone is interested in playing with it, the datatype library  
(with source; MIT/expat license) is available from

http://hsivonen.iki.fi/test/permissive-ids.jar

The namespace is
http://hsivonen.iki.fi/datatype/permissive-id

The local names are ID, IDREF and IDREFS.

The jar is deployable by simply putting it in the CLASSPATH, but  
beware of the -jar switch which ignores the external CLASSPATH.


(Disclaimer: The datatype library is designed for diagnosing how a  
RELAX NG implementation works. It should not be used without  
modification in a production system. It prints to System.out and  
probably has slightly broken equality testing behavior.)


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] id and xml:id

2006-04-04 Thread Henri Sivonen

On Apr 3, 2006, at 18:37, Henri Sivonen wrote:

It appear that enabling ID/IDREF checking wreaks havoc with schemas  
that have not been written with this in mind.


I have not yet assessed the extent of the damage, but it could turn  
out that ID/IDREF checking needs to go in a separate schema like  
exclusions.


I have now assessed the damage. It is not as bad as it looked like. :-)

Despite a flood of error messages, there were only three causes:
1) Can't have wild card attributes on wild card elements in the wild  
card content models of the script and style elements. (Not a big  
deal. It is reasonable to restrict them to known style and script  
languages.)


2) Jing complains about the IDREFness altering co-occurrence  
constraint between valuetype and value on the param element.


3) It appears that in RELAX NG an attribute can't be allowed to take  
the empty string if the attribute has the IDREFS nature. This is a  
problem with the form attribute.

See: http://groups.yahoo.com/group/rng-users/message/422

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] id and xml:id

2006-04-03 Thread Henri Sivonen

On Apr 3, 2006, at 00:00, fantasai wrote:


Henri Sivonen wrote:

On Apr 2, 2006, at 18:56, fantasai wrote:
I'd rather see the id attribute restricted to an NCName token  
insofar
as possible. We can make an exception for Hixie's repetition   
templates,
but otherwise I think it should be compatible with the XML ID  
syntax.
Do you mean common attrs should have a co-occurrence constraint  
that  changes the datatype of the id attribute if the repeat  
attribute is  present?


Yes. Or, at the very least, if the repetition module is loaded.


Changing id in some cases to an attribute that does not have the ID  
nature would be problematic, but see below.


I wasn't even expecting to be able to do IDREF integrity checks  
in  RELAX NG. I was planning on doing it in Schematron or Java.  
Besides,  general IDREF integrity checking does not check that,  
for example,  the form attribute references only form elements and  
not just any ids.


I would want that in the RelaxNG schema because there are editing  
tools
that hook into RelaxNG, but not many (or any besides validators)  
that can
hook into Schematron (Glazou, for example, is working on a RelaxNG- 
driven

editor.)


I agree that editor-friendliness is a worthy goal. I have been  
keeping it in mind, even though I have not actually been testing  
schemas in any RELAX NG-aware editor.


Schematron is not amenable to editor autocompletion features, but in  
*theory* it could be used for discovering errors by running the  
validation function over the document being edited from time to time.



RelaxNG /can/ do IDREF integrity checks.


It turns out that the ID nature in RELAX NG DTD Compatibility does  
*not* require the ID value to be an NCName. That is a further  
restriction imposed by the http://relaxng.org/ns/compatibility/ 
datatypes/1.0 and http://www.w3.org/2001/XMLSchema-datatypes datatype  
libraries. The ID nature itself only requires that that the ID value  
does not contain whitespace.


I spent quite a while today verifying (by implementing a more  
permissive ID datatype library) that James Clark's Jing agrees with  
my reading of the spec. It does, which is good evidence that my  
reading of the spec is correct. :-)


I don't know what kind of datatype library support Etna has or will  
have, but theoretically, it could even allow using Jing/MSV- 
compatible libraries via JNI. (That could actually be a worthwhile  
feature considering that Java API for datatype libraries is probably  
the most popular one.)


There is a problem, however. One of the main features of RELAX NG is  
that it allows ambiguous grammars: It is OK for a document to be  
valid according to multiple derivations. RELAX NG DTD Compatibility  
restricts grammar ambiguity, because the IDness of an attribute can't  
remain ambiguous. It appear that enabling ID/IDREF checking wreaks  
havoc with schemas that have not been written with this in mind.


I have not yet assessed the extent of the damage, but it could turn  
out that ID/IDREF checking needs to go in a separate schema like  
exclusions. (Does Etna support multiple schemas at a time effectively  
ANDing them?)



The part about form
attributes referencing only form elements can be checked by  
Schematron.


OK.

From an authoring standpoint, the *most* useful part of IDREF  
integrity
checking is to check against typos, not against misinterpretation  
of the

idref attribute's intent. :)


OK.

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] id and xml:id

2006-04-02 Thread fantasai

Henri Sivonen wrote:

On Apr 2, 2006, at 18:56, fantasai wrote:


I'd rather see the id attribute restricted to an NCName token insofar
as possible. We can make an exception for Hixie's repetition  templates,
but otherwise I think it should be compatible with the XML ID syntax.


Do you mean common attrs should have a co-occurrence constraint that  
changes the datatype of the id attribute if the repeat attribute is  
present?


Yes. Or, at the very least, if the repetition module is loaded.


I was planning on defining the datatype of the id attribute as
  xsd:string {
pattern = "\S+"
  }

NCName with the exception that it allows [ and ] will be one huge  
regexp. (But doable, of course.) If that is what we want, the syntax  
should probably be
(Letter | '_') (NCNameCharWithout02D1and00B7)* (('[' | #x02D1)  ( 
NCNameCharWithout02D1and00B7)+ (']' | #x00B7)))?  ( 
NCNameCharWithout02D1and00B7)*

with the XML 1.0 definitions of Letter and NCNameChar.


Cool, that would even catch mismatched brackets. :)


The concept of "idness" is a useful one for many tools, and even if
browsers don't care what characters there are, other tools do. We  can't
express IDness in a schema if we insist on ignoring its syntactic
restrictions.


I didn't bother to make that argument, because I thought changing the  
language to fit schemas wouldn't go down well with Hixie. :-)


(In http://hsivonen.iki.fi/lists-in-attributes/ I tried to bring a  
general "less code and more reuse of correct code" argument into it  
instead of only playing the "it's incompatible with my schema  language 
of choice" argument.)


It's not "my schema language of choice", it's "the top three (by a long
shot) schema languages in use for XML".

I wasn't even expecting to be able to do IDREF integrity checks in  
RELAX NG. I was planning on doing it in Schematron or Java. Besides,  
general IDREF integrity checking does not check that, for example,  the 
form attribute references only form elements and not just any ids.


I would want that in the RelaxNG schema because there are editing tools
that hook into RelaxNG, but not many (or any besides validators) that can
hook into Schematron (Glazou, for example, is working on a RelaxNG-driven
editor.) RelaxNG /can/ do IDREF integrity checks. The part about form
attributes referencing only form elements can be checked by Schematron.
From an authoring standpoint, the *most* useful part of IDREF integrity
checking is to check against typos, not against misinterpretation of the
idref attribute's intent. :)

~fantasai


Re: [whatwg] id and xml:id

2006-04-02 Thread Henri Sivonen

On Apr 2, 2006, at 19:26, Anne van Kesteren wrote:

I agree. Note also that the repetition template also allows for  
characters that
are compatible with XML ID. Of course, this is only for valid  
documents... All
things should still be defined in a way that they take into account  
invalid,

yet well-formed, documents as well. (And HTML documents...)


I am interested in conforming DTD-invalid well-formed XHTML documents  
and conforming HTML documents. I think that whatever is allowed as an  
id attribute value in conforming HTML documents should also be  
conforming as a value of an id attribute in XHTML (but not  
necessarily conforming as an xml:id value) in order to allow XHTML- 
serializability of conforming HTML docs.


(I am not interested in DTD-valid documents. I consider DTDs harmful.)

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] id and xml:id

2006-04-02 Thread Henri Sivonen

On Apr 2, 2006, at 18:56, fantasai wrote:


I'd rather see the id attribute restricted to an NCName token insofar
as possible. We can make an exception for Hixie's repetition  
templates,

but otherwise I think it should be compatible with the XML ID syntax.


Do you mean common attrs should have a co-occurrence constraint that  
changes the datatype of the id attribute if the repeat attribute is  
present?


I was planning on defining the datatype of the id attribute as
  xsd:string {
pattern = "\S+"
  }

NCName with the exception that it allows [ and ] will be one huge  
regexp. (But doable, of course.) If that is what we want, the syntax  
should probably be
(Letter | '_') (NCNameCharWithout02D1and00B7)* (('[' | #x02D1)  
( NCNameCharWithout02D1and00B7)+ (']' | #x00B7)))?  
( NCNameCharWithout02D1and00B7)*

with the XML 1.0 definitions of Letter and NCNameChar.


The concept of "idness" is a useful one for many tools, and even if
browsers don't care what characters there are, other tools do. We  
can't

express IDness in a schema if we insist on ignoring its syntactic
restrictions.


I didn't bother to make that argument, because I thought changing the  
language to fit schemas wouldn't go down well with Hixie. :-)


(In http://hsivonen.iki.fi/lists-in-attributes/ I tried to bring a  
general "less code and more reuse of correct code" argument into it  
instead of only playing the "it's incompatible with my schema  
language of choice" argument.)


I wasn't even expecting to be able to do IDREF integrity checks in  
RELAX NG. I was planning on doing it in Schematron or Java. Besides,  
general IDREF integrity checking does not check that, for example,  
the form attribute references only form elements and not just any ids.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] id and xml:id

2006-04-02 Thread Anne van Kesteren

Quoting fantasai <[EMAIL PROTECTED]>:

I'd rather see the id attribute restricted to an NCName token insofar
as possible. We can make an exception for Hixie's repetition templates,
but otherwise I think it should be compatible with the XML ID syntax.


I agree. Note also that the repetition template also allows for 
characters that

are compatible with XML ID. Of course, this is only for valid documents... All
things should still be defined in a way that they take into account invalid,
yet well-formed, documents as well. (And HTML documents...)


--
Anne van Kesteren




Re: [whatwg] id and xml:id

2006-04-02 Thread fantasai

Henri Sivonen wrote:
Since UAs handle whitespace in the id attribute inconsistently (see  
below), old specs imply or require whitespace trimming and ids with  
whitespace are unreferencable from whitespace-separated lists of ids,  I 
suggest adding the following language concerning document conformance:


The value of the id attribute must be a string that consists of one  or 
more characters matching the following production: [#x21-#xD7FF]| 
[#xE000-#xFFFD]|[#x1-#x10] (any XML 1.0 character excluding  
whitespace).


I'd rather see the id attribute restricted to an NCName token insofar
as possible. We can make an exception for Hixie's repetition templates,
but otherwise I think it should be compatible with the XML ID syntax.

So

  xsd:id {
pattern: "\S*";
  }

The concept of "idness" is a useful one for many tools, and even if
browsers don't care what characters there are, other tools do. We can't
express IDness in a schema if we insist on ignoring its syntactic
restrictions.

~fantasai


Re: [whatwg] id and xml:id

2006-04-02 Thread Anne van Kesteren

Quoting Henri Sivonen <[EMAIL PROTECTED]>:

Also, I suggest requiring that elements must not have both id and
xml:id and requiring that xml:id must not occur in the HTML
serialization. (Again, from the document conformance point of view--
not disputing requirements on browsers.)


How could it occur in a HTML document? (Given that the browser in question is
namespace aware.) I'm assuming here that we're not talking about adding stuff
through the DOM given that you talk about serialization.



Rationale:
HTML doesn't have namespace processing of colonified names and the
xml:id spec is not designed for HTML. Allowing xml:id in HTML feels
intuitively wrong (perhaps even a bit evil :-).


I agree.



If an element had both an id attribute and an xml:id attribute with
different values, the document would not be HTML-serializable, which
would be bad.


Now I agree that's bad, but I think there is something to say for elements
having multiple IDs. (Even though that's not valid for some definition of it.)



(Obviously, even with only one kind of ID attribute on  an element,
in round tripping from XHTML to HTML to XHTML, the  information about
whether the original attribute was id or xml:id is  lost just like
the information about whether a table had a tbody is  lost.)


Interesting point. I think  should be required in XHTML
personally to go
against that. It just doesn't make sense the way it is now.



If an element was allowed to have an id attribute and an xml:id
attribute with the same value, the following constraint from xml:id
spec would be violated even for conforming docs:
"An xml:id processor should assure that the following constraint holds:
* The values of all attributes of type “ID” (which includes
all  xml:id attributes) within a document are unique."
( http://www.w3.org/TR/xml-id/ )
Assuming, of course, that the XHTML5 id can still be considered an ID
 in the XML sense.


It should be considered an ID in the XML sense for getElementByID and friends.



Finally, as the ultimate ID nitpicking, the spec should state that it
 is naughty of authors to turn attributes other than id and xml:id
into IDs via the DTD. (Well, using a DTD at all is naughty. :-)


But through DOM methods is ok? (I agree that DTDs are obsolete...)



Test case: http://hsivonen.iki.fi/test/wa10/adhoc/id.html


Interesting testcase!



Opera (weekly build 3312; note that Opera recently changed its
behavior to match the others with id=' c '):


Bah. I hope we can revert that... Do you have a similar test for xml:id? Opera
does (did?) passes the following for example:

http://annevankesteren.nl/test/xml/xml-id/008.xml

Cheers,

Anne


--
Anne van Kesteren




[whatwg] id and xml:id

2006-04-02 Thread Henri Sivonen
Since UAs handle whitespace in the id attribute inconsistently (see  
below), old specs imply or require whitespace trimming and ids with  
whitespace are unreferencable from whitespace-separated lists of ids,  
I suggest adding the following language concerning document conformance:


The value of the id attribute must be a string that consists of one  
or more characters matching the following production: [#x21-#xD7FF]| 
[#xE000-#xFFFD]|[#x1-#x10] (any XML 1.0 character excluding  
whitespace).


Also, I suggest requiring that elements must not have both id and  
xml:id and requiring that xml:id must not occur in the HTML  
serialization. (Again, from the document conformance point of view-- 
not disputing requirements on browsers.)


Rationale:
HTML doesn't have namespace processing of colonified names and the  
xml:id spec is not designed for HTML. Allowing xml:id in HTML feels  
intuitively wrong (perhaps even a bit evil :-).


If an element had both an id attribute and an xml:id attribute with  
different values, the document would not be HTML-serializable, which  
would be bad. (Obviously, even with only one kind of ID attribute on  
an element, in round tripping from XHTML to HTML to XHTML, the  
information about whether the original attribute was id or xml:id is  
lost just like the information about whether a table had a tbody is  
lost.)


If an element was allowed to have an id attribute and an xml:id  
attribute with the same value, the following constraint from xml:id  
spec would be violated even for conforming docs:

"An xml:id processor should assure that the following constraint holds:
* The values of all attributes of type “ID” (which includes all  
xml:id attributes) within a document are unique."

( http://www.w3.org/TR/xml-id/ )
Assuming, of course, that the XHTML5 id can still be considered an ID  
in the XML sense.


Finally, as the ultimate ID nitpicking, the spec should state that it  
is naughty of authors to turn attributes other than id and xml:id  
into IDs via the DTD. (Well, using a DTD at all is naughty. :-)


- -

Test case: http://hsivonen.iki.fi/test/wa10/adhoc/id.html
The script tries every id with a whitespaceless value to see if  
whitespace is trimmed before ID assignment.


Firefox:

id='a' PASS
id='2' PASS
id='<' PASS
id=',' PASS
id='ä' PASS
id=' c ' FAIL
id='\nd\n' PASS
id='\t\te\t\t' PASS
id='
f
' PASS

Opera (weekly build 3312; note that Opera recently changed its  
behavior to match the others with id=' c '):


id='a' PASS
id='2' PASS
id='<' PASS
id=',' PASS
id='ä' PASS
id=' c ' FAIL
id='\nd\n' PASS
id='\t\te\t\t' PASS
id='
f
' FAIL

Safari and IE 6:

id='a' PASS
id='2' PASS
id='<' PASS
id=',' PASS
id='ä' PASS
id=' c ' FAIL
id='\nd\n' FAIL
id='\t\te\t\t' FAIL
id='
f
' FAIL

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/