Re: [whatwg] PaceEntryMediatype

2006-12-01 Thread Thomas Broyer

2006/11/30, Ian Hickson:

On Thu, 30 Nov 2006, Thomas Broyer wrote:

 I'd prefer basing autodiscovery on the media types and not at all on the
 relationships. A feed relationship would only help finding the living
 resource (similar to rel=current in the Atom Relationship Registry)
 if you're not already on it (in which case, rel=alternate would be
 used).

 UAs would then obviously continue to support autodiscovery using
 alternate all-over-the-place, this would just be a lucky side-effect;
 and everyone would be happy.

So as far as I can tell, that's what HTML5 currently requires. Am I
interpreting you correctly?


Hmm, I'm afraid you don't.

For some background, see these mails on the Atom lists:
http://www.imc.org/atom-syntax/mail-archive/msg19100.html
http://www.imc.org/atom-syntax/mail-archive/msg19107.html

There's a parallel discussion on the Atom lists about the Atom media types.

A summary of my problem with HTML5's autodiscovery:
- there shouldn't be a 'rel' value for subscribability,
subscribability is a matter of whether and how an UA can process
content from a particular media type
- HTML5 shouldn't say anything about which media type is
subscribable: application/atom+xml can be an Atom Entry, and there
might be other subscribable media types (some aggregators allow you to
subscribe to HTML); in other words, there shouldn't be any assumption
of subscribability *from within the spec*.
- rel=feed could be useful, but as a real relationship between
resources (the resource pointed to by a rel=feed link is a 'feed' in
which the current resource believes it appears or has appeared as
a contained item), not as defined currently in HTML5.

Actually my main problems are:
- the definition of rel=feed
- the assumption that rel=alternate+Atom or rel=alternate+RSS is
equivalent to rel=feed alternate

--
Thomas Broyer


Re: [whatwg] HTML syntax: shortcuts for 'id' and 'class' attributes

2006-12-01 Thread Martin Atkins

Andrew Fedoniouk wrote:

| 
|  p.myclass.../p is equivalent of
|  p class=myclass.../p
| 
|
| HTML5 is meant to be backwards compatible, so this is out of the question.

And where do you see problems with backward compatibility?
Or let's put this way: what would be a definition of backward compatibility in 
terms of HTML5?




Anything specified in HTML5 must degrade gracefully in pre-HTML5 
software. Unless at the very least current browsers parse p.myclass as 
p this proposal is unworkable under this constraint.





Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-12-01 Thread Henri Sivonen

On Dec 1, 2006, at 04:15, Michel Fortin wrote:

that their valid XHTML1 documents served as text/html, when updated  
to XHTML5, are now called valid HTML5 documents by the validator.


Except:
 * xmlns is illegal in HTML5.
 * xml:lang vs. lang.
 * base vs. xml:base.
 * meta http-equiv... vs. ?xml version='1.0' encoding=...

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] HTML syntax: shortcuts for 'id' and 'class' attributes

2006-12-01 Thread Benjamin Hawkes-Lewis
On Fri, 2006-12-01 at 00:15 -0800, Andrew Fedoniouk wrote:

 Probably solution could be in creation of Open HTML/CSS/Script
 specification that will make conditions for competition of various
 approaches/technologies. Who knows?

I doubt it. HTML persists as a mainstream format because Internet
Explorer cannot handle application/xhtml+xml. I predict that will change
in IE8. There are other old-style SGML-based languages, but I haven't
seen much use of them on the web, which isn't surprising as browsers
aren't really SGML readers. And adding existing XML-based languages to
the text/html world is tricky (see the problems with MathML and HTML5).
There are now a /lot/ of XML-based languages. Many of these have not
made into mainstream browsers yet, like TEI and DocBook. But as browser
support for XML improves, any challenger is likely to be up against
XHTML 2 or Web Applications 1.0, not HTML. HTML is a small world of
legacy serialization; XML is a big world of possibility, if for no other
reason than it's easy to create new XML languages.

In practice, competition between such languages will only work if we
develop ways of dealing with differing levels of browser support. And
that means either describing one language in terms of another, or
serving different serializations of the same content. Anything else
would be an accessibility nightmare.

--
Benjamin Hawkes-Lewis



[whatwg] markup as authored in practice (was: something about slashes)

2006-12-01 Thread Robert Sayre

On 11/30/06, Ian Hickson [EMAIL PROTECTED] wrote:


 I'd gladly put in a !DOCTYPE html in my page, the question is: would
 the WHATWG be willing to meet me half way and allow xmlns attributes in
 a very select and carefully prescribed set of locations?

This seems like a bad idea. If you have HTML, parse it as HTML. If you
have XML, parse it as XML. Don't try to use an XML parser to parse HTML or
vice versa. The syntaxes, although superficially similar to the extent
that it is possible to make a single document that is parsable using
either processor, are not similar enough to be treated equivalently.


I think the point is more subtle.



 My theory is that we live in a cut and paste world, one based on partial
 understanding.  Few understand DOCTYPEs and xmlns attributes, mostly
 people crib from something that works.

Too true.


I haven't done a study, but the observation you've agreed with is very
accurate in my experience. If you assume that most people don't
understand xmlns attributes, and most people use very few namespaces,
you'll find that xmlns attributes that bind the default namespace are
most common. I suspect this is because they are very easy to cut and
paste in a modular fashion. In syndication feeds, there are colloquial
default namespace prefixes (dc *always* means Dublin Core), but that
is an edge case.

When the 'xmlns:dc' attribute name is encountered, most people are
using it as a magic flag. It is almost never a good example of
decentralized naming, with an arbitrary prefix and scope.

So, is it poisonous to allow

!DOCTYPE html
html lang=en-US
head title Demonstration /title /head
body
div
 phmm/p
 svg width=100 height=100 xmlns=http://www.w3.org/2000/svg;
style=float:right
 /svg
/div
/body
/html

in HTML5? Basically, user-defined tags would be allowable if the
boundary between HTML5 and those tags were delineated with a URI in an
attribute with the name 'xmlns' (no prefix machinery). This suggestion
should not be confused with using an XML parser or XML Namespaces to
process HTML5.

--

Robert Sayre


[whatwg] HTML syntax: Tag omission and attributes

2006-12-01 Thread Simon Pieters

Hi,

It is obvious, but should still be specified that start tags with attributes 
can't be omitted.


Regards,
Simon Pieters

_
Leta bloggar om dina intressen 
http://spaces.live.com/default.aspx?page=Interestsss=False




Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-12-01 Thread Rimantas Liubertas

2006/12/1, Ian Hickson [EMAIL PROTECTED]:
...

 An example of something that is NOT implemented interoperably is
 script src=.../.

As far as I can tell, script/ is handled by all browsers the same way as
script. How is it not interoperable?


That's true, however, what happens depends on the browser and presence
of /script in the code.

When IE encounters script  type=text/javascript src=somescript.js
/ it swallows everything after as the content of script. If there is
no /script in the source - that's it.

Firefox likes consistency: script  type=text/javascript
src=somescript.js / works OK,

This is OK too:

script  type=text/javascript src=somescript.js /
psome text/p
script  type=text/javascript src=somescript2.js /

However
script  type=text/javascript src=somescript.js /
psome text/p
script  type=text/javascript src=somescript2.js/script

Produces only single SCRIPT in DOM tree swallowing paragraph and the
second script.

Opera handles the last example just fine.

Regards,
Rimantas
--
http://rimantas.com/


[whatwg] Editorial: Tag omission

2006-12-01 Thread Simon Pieters

Hi,

This sentence:

  However, a start tag must never be omitted if the element
  to which it belongs is immediately preceeded by another
  element with the same name, whose end tag has been omitted.

AFAICT, this only applies to colgroup. Why not move this requirement to 
the colgroup entry?


All entries on start tags mention the head element, should be replaced 
with the relevant entry's element instead.


The /colgroup entry has a markup error (s/pan/span/).

Regards,
Simon Pieters

_
Hitta bloggar om dina intressen 
http://spaces.live.com/default.aspx?page=Interestsss=False




[whatwg] lang vs. xml:lang; id vs. xml:id

2006-12-01 Thread Michel Fortin

The spec tells us:

The lang attribute only applies to HTML documents. Authors must not  
use the lang attribute in XML documents. Authors must instead use  
the xml:lang attribute, defined in XML. [XML]


To determine the language of a node, user agents must look at the  
nearest ancestor element (including the element itself if the node  
is an element) that has a lang or xml:lang attribute set. That  
specifies the language of the node.


If both the xml:lang attribute and the lang attribute are set, user  
agents must use the xml:lang attribute, and the lang attribute must  
be ignored for the purposes of determining the element's language.


While the requirement for authors is pretty clear (HTML: lang; XHTML:  
xml:lang), it seems to me that the user agent is asked to always  
favour xml:lang even in an HTML context. Is this really what's  
intended? I think this ought to be clarified.



Also:


The id DOM attribute must reflect the id content attribute.


Does that mean it should not reflect xml:id even when id is not defined?


Michel Fortin
[EMAIL PROTECTED]
http://www.michelf.com/




[whatwg] Editorial comments on WebApps 1.0 section 9.1.2

2006-12-01 Thread Elliotte Harold
The contents of the element must be placed between just after the start 
tag (which might be implied, in certain cases) and just before the end 
tag (which again, might be implied in certain cases).


I wonder about the just after and just before. Is there something in 
the middle that is not just after and just before? I think it might be 
clearer to state, The contents of the element must be placed between 
the start tag (which might be implied, in certain cases) and  the end 
tag (which again, might be implied in certain cases). or The contents 
of the element must be placed after the start tag (which might be 
implied, in certain cases) and before the end tag (which again, might be 
implied in certain cases).



and what about the word placed? That seems to suggest that one could 
put this content somewhere else. I think of it more as the placement 
defines what is and is not the content so how about,


The content of the element is all the text between the start tag (which 
might be implied, in certain cases) and the end tag (which again, might 
be implied in certain cases).


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/


[whatwg] Editorial: hyphenating start tag and end tag

2006-12-01 Thread Elliotte Harold
The XML 1.0 and most related specifications use the hyphenated spellings 
start-tag and end-tag. The Web Apps 1.0 spec uses the unhyphenated 
forms start tag and end tag.


While I'm not sure there's a fundamental reason to prefer one form over 
the other, the copy editor in me would prefer to be consistent with the 
XML 1.0 spelling.


This hyphenated form has also been enshrined in the O'Reilly Default 
Stylesheet and Word List 
http://www.oreilly.com/oreilly/author/stylesheet.html for the same 
reasons, and is thus the preferred form in O'Reilly books.


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/


[whatwg] 9.1.2.1: trailing slash and atheism

2006-12-01 Thread Elliotte Harold

9.1.2.1 states:

Then, if the element is one of the void elements, then there may be a 
single U+002F SOLIDUS character. This character has no effect except to 
appease the markup gods. As this character is therefore just a symbol of 
faith, atheists should omit it.


The second sentence is false, and also likely to cause unnecessary 
conflict with fundamentalists who don't understand markup and don't get 
the joke. But mostly it's false. I suggest rewriting as follows:


This character has no effect when the document is parsed by an HTML5 
parser. However, if the document when parsed by an XML parser, the 
trailing slash converts the tag into an empty-element tag, and thereby 
makes an otherwise malformed element well-formed.


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/


[whatwg] Editorial: dfns in TOC

2006-12-01 Thread Simon Pieters

Hi,

Since there are some dfns in headings, the table of contents now also 
contains dfns, resulting in duplicate defined terms. Since the spec 
doesn't allow duplicate defined terms, I guess this was not intentional...


Regards,
Simon Pieters

_
Sälj gamla spel - köp julklappar! 
http://kopochsalj.eniro.se/query?what=gti_fronttpl=gti_frontax=msn




Re: [whatwg] 9.1.2.1: trailing slash and atheism

2006-12-01 Thread James Graham

Elliotte Harold wrote:

This character has no effect when the document is parsed by an HTML5 
parser. However, if the document when parsed by an XML parser, the 
trailing slash converts the tag into an empty-element tag, and thereby 
makes an otherwise malformed element well-formed.


If you're trying to parse a HTML5 document with an XML parser you're doing 
something really screwy anyway.


--
Eternity's a terrible thought. I mean, where's it all going to end?
 -- Tom Stoppard, Rosencrantz and Guildenstern are Dead


Re: [whatwg] 9.2.1.2.3 spaces between quoted attribute values

2006-12-01 Thread Anne van Kesteren
On Fri, 01 Dec 2006 13:16:28 +0100, Elliotte Harold  
[EMAIL PROTECTED] wrote:
Attributes names and unquoted attribute values must be separated from  
each other and from the tag name and the U+002F SOLIDUS character  
mentioned below (if present) by one or more space characters.


Is this then legal?

p id=p1class=foo

Shouldn't quoted attribute values also be separated by space from  
attribute names?


It has been legal in HTML for some time now...


--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/


[whatwg] Editorial: code point

2006-12-01 Thread Elliotte Harold
The Unicode spec spells code point as two words; the Web apps 1.0 spec 
uses one: codepoint. I suggest we follow the Unicode spelling.


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/


[whatwg] Editorial: dfn s/term given by the contents/term/

2006-12-01 Thread Simon Pieters

Hi,

The dfn element:

  The dfn element represents the defining instance of a term.
  The paragraph, definition list group, or section that contains
  the dfn element contains the definition for the term given
  by the contents of the dfn element.

Given the definition of defining term two paragraphs later, the term of 
the dfn element is not always the contents of the element. I suggest 
replacing the above text with:


  The dfn element represents the defining instance of a term.
  The paragraph, definition list group, or section that contains
  the dfn element contains the definition for the _term_ of
  the dfn element.

Regards,
Simon Pieters

_
Martin Stenmarck som ringsignal http://msn.cellus.se/



[whatwg] Valid Unicode

2006-12-01 Thread Elliotte Harold

In 9.1.3 we see

Text must consist of valid Unicode characters other than U+. Text 
should not contain control characters other than space characters.



Later in 9.2.3.1 we find:

If the number is not a valid Unicode character (e.g. if the number is 
higher than 1114111), or if the number is zero, then return a character 
token for the U+FFFD REPLACEMENT CHARACTER character instead.



I do not think the Unicode spec defines the notion of a valid Unicode 
character. (It does define a valid Unicode code unit sequence, but 
that's a little different. A code unit sequence generally consists of 
more than one character.) Thus I suggest we need to be more precise here 
about what is and is not a valid Unicode character. In particular:



1. Are private use characters allowed?
2. Are control characters allowed (probably yes, based on other parts of 
the spec).

3. Are surrogate characters allowed? (probably no)
4. Are non-characters beyond 10 allowed (no)
5. Are reserved but currently undefined characters allowed (yes)
6. Are noncharacters U+FDD0..U+FDEF allowed (?)
7. Are the noncharacters from the last two characters of each plane 
allowed (?)



--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/


[whatwg] Complex annotations (was Re: Element content models)

2006-12-01 Thread Benjamin Hawkes-Lewis
I wrote:
 However, I would vehemently stress that it is not that uncommon
 for notes and marginalia to themselves have notes or marginalia,

Then Michael(tm) Smith asked:

 I don't doubt that there are some, but are you aware of any
 specific examples?

Well, most famously (and if you really want your head to spin), have a
look at that masterpiece of hypertext avant la lettre, Pierre
Bayle's /Dictionnaire historique et critique/:

http://www.lib.uchicago.edu/efts/ARTFL/projects/dicos/BAYLE/

And then see how impoverished the experience becomes when translating
into the crude mechanics of current hypertext at:

http://www.pierrebayle.com/

For various reasons (e.g. less prestige in demonstrating familiarity
with the classics; more open reliance on translation; the crassness of
the world of commercial publication; the ready availability of decent
libraries in the rich Western countries), modern scholarship has tended
to have simpler notes.

One major exception however are critical editions. For example, have a
look at Harold Jenkins's magisterial edition of /Hamlet/ (1982). Jenkins
adopts a tripartite scheme of notes. More than half of each page of the
text is filled with notes. Immediately below the text of the play, you
will find notes given collations and emendations of the text. Below that
will you find short discursive notes. Often a discursive note will
include the abbreviation LN, indicating that you should hunt down the
line reference in the back of the book to find Jenkins discoursing on
some subject in exhaustive detail in Longer Notes. (And this is on top
of an introduction longer than the play and an appendix to boot!)

Michael(tm) Smith also raised the problem of how to render such
annotations:

 If you have, for example,numbered footnotes, then how do you number a
 footnote to a footnote? And how would an application programatically
 determine how to number it? If you have marginalia that annotate other
 marginalia, where do you place the additional marginalia?

My considered judgement is that these are all excellent questions to
which we need well-thought out and tested solutions, rather than reasons
to arbitrarily restrict hypertextual annotation to a single dimension.

My less considered suggestion about what those solutions might entail
would be annotation sets which could group (say) footnotes and
endnotes, or notes and long notes. Something like: note
set=collations and note set=longnotes.

I don't think browsers would have any great difficult with applying a
default priority of numbering systems. I would note that removing the
numbering of ordered elements from HTML into CSS was a serious error,
because numbering is part of the reference structure of a text,
not /just/ a matter of disposable styles. I think I've even seen legal
documents resort to absurd markup like:

ul
li2.i The aforementioned agree to [...] /li
li2.ii [...] /li
/ul

The form of numbering properly belongs in an attribute, which would be
best inherited from noteset, for example:

noteset id=longnotes numbering=alphabetic

If we wanted to provide finer-grain control it would be a good idea to
also allow note set=longnotes referenceSymbol=‡

I also think it would be best if we could have notes that refer outside
of the main document, like XHTML 2 could do something like:

note set=longnotes href=longnotes.html#note-d /

Unfortunately, because of the need for backwards compatibility, HTML5
would presumably have to do something lamer like:

a rel=note:longnotes href=longnotes.html#note-dnote
set=longnotes //a

With regards to positioning, you can put annotations anywhere: inserted
into the text, at the top/bottom/left/right of the viewport, at the end
of the document, in a separate document, in a popup ... there all sorts
of possibilities.

Sorry that these thoughts are a bit rough.

--
Benjamin Hawkes-Lewisw



Re: [whatwg] lang vs. xml:lang; id vs. xml:id

2006-12-01 Thread Lachlan Hunt

Michel Fortin wrote:

The spec tells us:
If both the xml:lang attribute and the lang attribute are set, user 
agents must use the xml:lang attribute, and the lang attribute must be 
ignored for the purposes of determining the element's language.


While the requirement for authors is pretty clear (HTML: lang; XHTML: 
xml:lang), it seems to me that the user agent is asked to always favour 
xml:lang even in an HTML context. Is this really what's intended? I 
think this ought to be clarified.


http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2005-April/003652.html

--
Lachlan Hunt
http://lachy.id.au/


[whatwg] xml:lang and xmlns in HTML

2006-12-01 Thread Michel Fortin

Le 1 déc. 2006 à 3:47, Henri Sivonen a écrit :


On Dec 1, 2006, at 04:15, Michel Fortin wrote:

that their valid XHTML1 documents served as text/html, when  
updated to XHTML5, are now called valid HTML5 documents by the  
validator.


Except:
 * xmlns is illegal in HTML5.
 * xml:lang vs. lang.
 * base vs. xml:base.
 * meta http-equiv... vs. ?xml version='1.0' encoding=...


Ok, fine. The document still has to be non-conformant with one of the  
two syntaxes, and that's always true since xmlns is required for  
XHTML but not allowed in HTML. Still, that list of difference is  
amazingly short. Could it be shorter? Should it be?


I wonder if xml:lang and xmlns couldn't be made legal in HTML.  
xml:lang would simply become conformant in HTML as a synonym for the  
lang attribute, it's already in the spec that it should get the  
correct treatment anyway. xmlns would only be allowable on html and  
only with the HTML namespace as its value.


This would make it possible to have documents conformant with both  
syntaxes at the same time. That's assuming you don't use base or  
meta http-equiv=; in the cases they're needed they'd have to be  
changed to xml:base and ?xml ?, but that's a lot simpler to do than  
to change every instance of lang in a document for xml:lang, and it  
can be avoided in the vast majority of the cases.


This could also help reinforce the idea that it's the media type that  
differentiate HTML from XHTML. It'd make many valid XHTML1 documents  
out there conformant with HTML5 with a mere modification to the  
doctype. Just like for /, xmlns and xml:lang are already pretty  
common on text/html pages because of XHTML1. I concede however that  
having the word xml at two places in the HTML language could make  
things a little more confusing.


What do you think?


Michel Fortin
[EMAIL PROTECTED]
http://www.michelf.com/




Re: [whatwg] lang vs. xml:lang; id vs. xml:id

2006-12-01 Thread Michel Fortin

Le 1 déc. 2006 à 8:33, Lachlan Hunt a écrit :

If both the xml:lang attribute and the lang attribute are set,  
user agents must use the xml:lang attribute, and the lang  
attribute must be ignored for the purposes of determining the  
element's language.


While the requirement for authors is pretty clear (HTML: lang;  
XHTML: xml:lang), it seems to me that the user agent is asked to  
always favour xml:lang even in an HTML context. Is this really  
what's intended? I think this ought to be clarified.


http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2005- 
April/003652.html


Okay, so if I understand well, xml:lang in the spec refers to the  
lang attribute in the xml namespace, not to the xml:lang  
attribute in the null namespace that you get with the HTML parser. It  
makes sense from a DOM perspective, but it's misleading from a markup  
perspective, so I still think it should be clarified.


And although it's less confusing, I think the same should be  
clarified about xml:id: it's the id attribute in the xml  
namespace, not the xml:id attribute in the null namespace that you  
get with the HTML parser.



Michel Fortin
[EMAIL PROTECTED]
http://www.michelf.com/




Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-12-01 Thread Ian Hickson
On Fri, 1 Dec 2006, Rimantas Liubertas wrote:
  
  As far as I can tell, script/ is handled by all browsers the same 
  way as script. How is it not interoperable?
 
 That's true, however, what happens depends on the browser and presence 
 of /script in the code.

Right, the interoperability problems with script are unrelated to 
trailing slashes. (Bugs have been filed to fix those differences.)

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Valid Unicode

2006-12-01 Thread Henri Sivonen

On Dec 1, 2006, at 14:38, Elliotte Harold wrote:


1. Are private use characters allowed?


I think the answer should be Yes, because not allowing them could  
make people subvert Unicode and use e.g. Latin-1 code points for a  
different purpose with a bogus font. Also, not allowing them would be  
a violation of Charmod requirements for specs.


2. Are control characters allowed (probably yes, based on other  
parts of the spec).


Personally, I'd like to make non-conforming the control characters  
that XML 1.0 disallows (in order to keep conforming HTML5 documents  
convertible to XHTML5) as well as C1 controls (because they have no  
legitimate use in HTML but are a sign of a common bug).



3. Are surrogate characters allowed? (probably no)


Surrogates are an artifact of UTF-16. They have no place on the  
character level. So I'd say No.



6. Are noncharacters U+FDD0..U+FDEF allowed (?)
7. Are the noncharacters from the last two characters of each plane  
allowed (?)


I don't have particularly strong feelings here. Putting those  
characters is HTML is a bad idea, but allowing them is not a problem  
for HTML5 to XHTML5 conversion and they aren't a common problem like  
C1 controls.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] PaceEntryMediatype

2006-12-01 Thread Ian Hickson
On Fri, 1 Dec 2006, Thomas Broyer wrote:
 
 A summary of my problem with HTML5's autodiscovery:
 - there shouldn't be a 'rel' value for subscribability,
 subscribability is a matter of whether and how an UA can process
 content from a particular media type

Agreed. The spec doesn't mention subscribing, just that rel=feed means 
it's a syndication feed.


 - HTML5 shouldn't say anything about which media type is
 subscribable: application/atom+xml can be an Atom Entry, and there
 might be other subscribable media types (some aggregators allow you to
 subscribe to HTML); in other words, there shouldn't be any assumption
 of subscribability *from within the spec*.

Agreed, within the constraints of backwards compatibility. While it 
doesn't mention subscribing to them, there are two explicit values for the 
type= attribute which have been grandfathered into meaning rel=feed. 
This is needed for compatibility with existing content and existing UAs, 
and isn't something that we have any ability to change, given the 
widespread use of these types for that purpose.


 - rel=feed could be useful, but as a real relationship between 
 resources (the resource pointed to by a rel=feed link is a 'feed' in 
 which the current resource believes it appears or has appeared as a 
 contained item), not as defined currently in HTML5.

It sounds like you're describing rel=feed alternate, which is a 
syndication feed explicitly for the current document, as opposed to a 
rel=feed on its own, which is a syndication feed for any random subject.


 Actually my main problems are:
 - the definition of rel=feed

It's not clear to me what you think needs changing. Could you suggest an 
explicit set of changes that would satisfy you?


 - the assumption that rel=alternate+Atom or rel=alternate+RSS is
 equivalent to rel=feed alternate

This is out of our hands, sadly. There are literally hundreds of millions 
of deployed link elements that make those assumptions today. We can't 
break legacy UAs and documents.

Cheers,
-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] HTML syntax: Tag omission and attributes

2006-12-01 Thread Ian Hickson
On Fri, 1 Dec 2006, Simon Pieters wrote:
 
 It is obvious, but should still be specified that start tags with 
 attributes can't be omitted.

Good point! Fixed.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Editorial: Tag omission

2006-12-01 Thread Ian Hickson
On Fri, 1 Dec 2006, Simon Pieters wrote:
 
 This sentence:
 
   However, a start tag must never be omitted if the element
   to which it belongs is immediately preceeded by another
   element with the same name, whose end tag has been omitted.
 
 AFAICT, this only applies to colgroup. Why not move this requirement 
 to the colgroup entry?

Fixed.

 All entries on start tags mention the head element, should be replaced 
 with the relevant entry's element instead.

Fixed.

 The /colgroup entry has a markup error (s/pan/span/).

Fixed.

Also there was duplication of the thead tag omission. Removed the first one.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] 9.1.2.1: trailing slash and atheism

2006-12-01 Thread Ian Hickson
On Fri, 1 Dec 2006, Elliotte Harold wrote:

 9.1.2.1 states:
 
 Then, if the element is one of the void elements, then there may be a single
 U+002F SOLIDUS character. This character has no effect [...]
 
 The second sentence is false [...] I suggest rewriting as follows:
 
 This character has no effect when the document is parsed by an HTML5 parser.

That's redundant. Parsing a document using this syntax with anything other 
than an HTML5 parser would be non-conforming.


 However, if the document when parsed by an XML parser, the trailing 
 slash converts the tag into an empty-element tag, and thereby makes an 
 otherwise malformed element well-formed.

This section has nothing to do with XML. If the document was parsed by an 
XML parser, then there are much bigger problems afoot, such as MIME type 
mislabelling, or a faulty UA.

Cheers,
-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Editorial: dfns in TOC

2006-12-01 Thread Ian Hickson
On Fri, 1 Dec 2006, Simon Pieters wrote:
 
 Since there are some dfns in headings, the table of contents now also 
 contains dfns, resulting in duplicate defined terms. Since the spec 
 doesn't allow duplicate defined terms, I guess this was not 
 intentional...

Yeah, this is something that will get fixed when I write a new spec 
post-processor. In the meantime, the spec complies to HTML4, not HTML5. :-)

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] 9.2.1.2.3 spaces between quoted attribute values

2006-12-01 Thread Ian Hickson
On Fri, 1 Dec 2006, Elliotte Harold wrote:

 Attributes names and unquoted attribute values must be separated from 
 each other and from the tag name and the U+002F SOLIDUS character 
 mentioned below (if present) by one or more space characters.
 
 Is this then legal?
 
 p id=p1class=foo

Yes.


 Shouldn't quoted attribute values also be separated by space from 
 attribute names?

No, it isn't necessary. Browsers support this interoperably, and it was 
allowed in HTML4, so there are almost certainly documents depending on it. 
You're allowed to put spaces there, but not required to.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Editorial: code point

2006-12-01 Thread Ian Hickson
On Fri, 1 Dec 2006, Elliotte Harold wrote:

 The Unicode spec spells code point as two words; the Web apps 1.0 spec 
 uses one: codepoint. I suggest we follow the Unicode spelling.

Fair enough. Changed. Please let me know if I let any codepoints slip 
back in (which is very likely).

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] HTML syntax: shortcuts for 'id' and 'class' attributes

2006-12-01 Thread Ian Hickson
On Thu, 30 Nov 2006, Boris Zbarsky wrote:
 
 No, because an HTML4 UA will not render that in any sort of reasonable 
 way (for example, in an HTML4 UA the p.myclass tag will never be 
 closed).

This exactly summarises why we can't do this.


On Thu, 30 Nov 2006, Andrew Fedoniouk wrote:
 
 Boris, what about this then:
 
 p .myclass ... /p
 p #myid ... /p
 
 (tag name and attribute delimeted by space)
 
 Can this be considered as enough backward compatible ?

I don't really see how this would be particularly beneficial. Saving two 
characters to specify an ID (and 5 for a class) at the cost of losing 
compatibility with all legacy UAs, seems pointless.


On Fri, 1 Dec 2006, Robert Sayre wrote:
 
 No. Try this rule of thumb: don't invent anything unless you absolutely 
 have to, or you're giving a name to something that is already happening.

Exactly.


On Fri, 1 Dec 2006, Andrew Fedoniouk wrote:
 
 Let's imagine that there are no such things as HTML5 and WHATWG yet.
 Only HTML 4.1, CSS and JavaScript in the wild.
 
 And here comes someone who will tell us: Hey, something wrong in this 
 triade - it is not serving needs of Web Applications well. So let's 
 start from HTML.
 
 No. Try this rule of thumb: don't invent anything unless you absolutely 
 have to, or you're giving a name to something that is already 
 happening.
 
 Absolutely applicable! Isn't it?

Yes... that's why we're not inventing anything unless we absolutely have 
to. Like canvas, or datagrid, or the various other new features in 
HTML5. (There aren't that many. They're all pretty vital.)


 In other words: what is so conceptually wrong with HTML 4.01 that 
 requires HTML5 to be designed?

HTML5 is HTML4, just better defined (fixing bugs in HTML4) and with a few 
new features to handle things HTML4 couldn't do.


Please let me know if there is something that was said in this thread that 
deserves further reply; I don't think there was (mostly this thread was 
people disagreeing with Andrew or off-topic discussion).

Cheers,
-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] xml:lang and xmlns in HTML

2006-12-01 Thread Ian Hickson
On Fri, 1 Dec 2006, Charles Iliya Krempeaux wrote:
  
  I thought XHTML-sent-as-text/html had explained in painful detail why 
  that's not a desirable end goal. Why would we want this?
 
 Do you have some links to that discussion.  I think I may have missed 
 it.
 
 (I know I probably don't qualify as a typical web developer, but... 
 I've actually been writing XHTML and returning it as text/html.)

http://www.hixie.ch/advocacy/xhtml is one often cited paper on the 
subject. It has not been updated to cater for HTML5, so some things in 
there are out of date, but it should give you a basic idea.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] PaceEntryMediatype

2006-12-01 Thread Robert Sayre

On 12/1/06, Kyle Marvin [EMAIL PROTECTED] wrote:


I'm still listening to the debate, but Mark's argument resonates with me.


Yes, Mark is starting to convince me as well.

--

Robert Sayre


Re: [whatwg] 9.1.2.1: trailing slash and atheism

2006-12-01 Thread Charles Iliya Krempeaux

Hello Ian,

On 12/1/06, Ian Hickson [EMAIL PROTECTED] wrote:


On Fri, 1 Dec 2006, Elliotte Harold wrote:

 9.1.2.1 states:

 Then, if the element is one of the void elements, then there may be a
single
 U+002F SOLIDUS character. This character has no effect [...]

 The second sentence is false [...] I suggest rewriting as follows:

 This character has no effect when the document is parsed by an HTML5
parser.

That's redundant. Parsing a document using this syntax with anything other
than an HTML5 parser would be non-conforming.


 However, if the document when parsed by an XML parser, the trailing
 slash converts the tag into an empty-element tag, and thereby makes an
 otherwise malformed element well-formed.

This section has nothing to do with XML. If the document was parsed by an
XML parser, then there are much bigger problems afoot, such as MIME type
mislabelling, or a faulty UA.



Sometimes web developers parse (non-XML) HTML with an XML parser because
it's the tool they have on hand.

Consider a PHP developer trying to analyse an HTML page.

If a PHP developer wants to analyse an HTML page; that developer may try to
use SimpleXML http://php.net/simplexml because that's what they have on
hand and know how to use.  There's no SimpleHTML available in PHP.

And while none of this is certainly our fault.  This is a situation some web
developers are going to run into.  (What else are they going to use?)


See ya

--
   Charles Iliya Krempeaux, B.Sc.

   charles @ reptile.ca
   supercanadian @ gmail.com

   developer weblog: http://ChangeLog.ca/


[whatwg] HTML5 Edit Link Relation (was: PaceEntryMediatype)

2006-12-01 Thread Robert Sayre

On 12/1/06, James M Snell [EMAIL PROTECTED] wrote:


What is the purpose of using alternate links? What is a UA supposed to
do with 'em? Why did I as a content publisher choose to use the
alternate link relation? Are all of these links of equal value to all
UA's?  Are they all expected to be processed in the same basic way?
Should an archive feed be treated the same way as a subscription feed?



Excellent point. HTML has link relations for this purpose. The list is
open, so you don't you really need to have a standard, but I suppose
putting them in Web Applications 1.0 would be a good idea.



  link rel=alternate subscribe
type=application/atom+xml
href=feed.xml /
  link rel=alternate edit
type=application/atom.entry+xml
href=entry.xml /
  link rel=alternate subsribe
type=application/rss+xml
href=rss.xml /


link rel=alternate feed
 type=application/atom+xml
 href=feed.xml /
 link rel=edit
 type=application/atom+xml
 href=entry.xml /

I think this would be a great thing to standardize in the WHAT-WG.

--

Robert Sayre


Re: [whatwg] PaceEntryMediatype - rel-type instead

2006-12-01 Thread Ernest Prabhakar

On Dec 1, 2006, at 10:42 AM, Kyle Marvin wrote:
I see the separation but I'm still missing a clear justifiication  
for it.  I don't see content-type as having anything to do with the  
audience.  It's about what media format you'd get back if you  
dereference the href and rel is about how you can interpret/ 
interact with it.   I feel like the primary audience for content- 
type is likely to be used in selecting some type of parser when  
retrieving the resource.  Orthogonal to this, the rel value  
assigns some semantic meaning to the resource (what does the entry  
or feed describe) and might also specify what interaction model you  
might expect via the href (ex. edit implies APP edit semantics on  
an entry resource).


+1 to what Kyle said

-- Ernie P.



Re: [whatwg] PaceEntryMediatype

2006-12-01 Thread James M Snell
You're right that the differentiation in the content-type is of less
importance but without it there's no way for me to unambiguously
indicate that a resource has both an Atom Feed representation and an
Atom Entry representation.  The best I could do is say This things has
two Atom representations.  Keep in mind that I want to be able to
differentiate the types of alternate representations available without
having to look at any of the other rel keywords.

- James

Kyle Marvin wrote:
 [snip]
 I see the separation but I'm still missing a clear justifiication for
 it.  I don't see content-type as having anything to do with the
 audience.  It's about what media format you'd get back if you
 dereference the href and rel is about how you can interpret/interact
 with it.   I feel like the primary audience for content-type is likely
 to be used in selecting some type of parser when retrieving the
 resource.  Orthogonal to this, the rel value assigns some semantic
 meaning to the resource (what does the entry or feed describe) and might
 also specify what interaction model you might expect via the href (ex.
 edit implies APP edit semantics on an entry resource).
 
 Cheers!
 


Re: [whatwg] xml:lang and xmlns in HTML

2006-12-01 Thread Ian Hickson
On Fri, 1 Dec 2006, Sam Ruby wrote:
  
  Except that wouldn't be backwards compatible since xml:lang= isn't 
  treated as a language attribute in legacy UAs.
 
 I thought that the HTML definition of backwards compatibility was If a 
 user agent encounters an attribute it does not recognize, it should 
 ignore the entire attribute specification (i.e., the attribute and its 
 value).

In the context of features that already work in legacy UAs, backwards 
compatibility also implies that the feature should work as much as 
possible in legacy UAs. (Graceful fallback.)


   This would make it possible to have documents conformant with both 
   syntaxes at the same time.
  
  I thought XHTML-sent-as-text/html had explained in painful detail why 
  that's not a desirable end goal. Why would we want this?
 
 Perhaps the problem is that your reformulation of Michel's assertion 
 doesn't capture the essence of the perceived requirement.

I don't understand.


 And given the frequency with which this question comes up, there 
 probably is a kernel of validity hiding in there somewhere.

It actually doesn't come up that much, compared to other things (e.g. how 
do I find what the selection in a text field is? is more common than how 
do I make my document conformant to XML and HTML at the same time?).


   This could also help reinforce the idea that it's the media type 
   that differentiate HTML from XHTML. It'd make many valid XHTML1 
   documents out there conformant with HTML5 with a mere modification 
   to the doctype.
  
  Not if they use things like ![CDATA[...]] or the empty element 
  syntax on non-void elements, or any number of other XMLisms.
 
 Until yesterday, empty element syntax on void elements was also an 
 XMLism. Perhaps the question as to whether ![CDATA[..]] should be 
 allowed should be explored with the same pragmatism as the empty/void 
 question was pursued.

But with ![CDATA[..]], namespaces, and most other XMLisms, the proposals 
fall down at the first step: they aren't compatible with legacy handling 
of HTML. The void element trailing / proposal only got considered 
because it was compatible with legacy UA handling, and would be of 
considerable help to authors who had fallen prey to the XHTML1 Appendix C 
fallacy and were trying to move to HTML5.


   What do you think?
  
  I don't think it's a goal for the two serialisations to have a common 
  subset.
 
 Whether it is a goal or not, it is a reality that the two serializations 
 are enough similar to confuse many.

I agree. However, until such time as all browsers support XHTML, I don't 
see any reason to use it. When all browsers support XHTML, then we can 
dump text/html altogether. Trying to use XHTML before XHTML is supported 
is putting the cart before the horse.

So, one of the two serialisations can be ignored, and authors need only 
use the latest version, namely HTML5.

Furthermore, HTML and XML are _different formats_. People don't use the 
same parser for RDF n3 and RDF XML, or the same parser for PNG and GIF, or 
the same parser for RelaxNG XML and RelaxNG Compact Syntax. Why would you 
use the same parser for XML and HTML? Treat them as different syntaxes. 
They have their own idiosyncrasies, conformance rules, parsing rules, and 
they only have a tiny amount of overlap. Treating them as the same 
language is not good design practice.

(There also seems to be an implicit assumption that XML is better than 
text/html. This certainly was true back when XML was well-defined and HTML 
was a mess of undefined reverse-engineering. However, HTML5 changes this; 
now, HTML is as well, if not better, defined than XML. The assumption that 
XML is intrinsically better should be revisited.)

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] HTML syntax: space between empty attribute and /

2006-12-01 Thread Ian Hickson
On Fri, 1 Dec 2006, Simon Pieters wrote:
 
   If an attribute using the empty attribute syntax is to be
   followed by another attribute or by one of the optional U+002F
   SOLIDUS (/) characters allowed in step 6 of the start tag
   syntax above, then there must be a space character separating
   the two.
 
 I don't see why it would be required to have a space between empty 
 attributes and the slash.

This was an oversight. Fixed.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] PaceEntryMediatype

2006-12-01 Thread Ian Hickson
On Fri, 1 Dec 2006, James M Snell wrote:

 You're right that the differentiation in the content-type is of less 
 importance but without it there's no way for me to unambiguously 
 indicate that a resource has both an Atom Feed representation and an 
 Atom Entry representation.

Assuming that an atom feed is a feed, and an atom entry is an 
alternative format representation of the same document:

   link rel=feed href=feed.xml
   link rel=alternate href=entry.xml

...does what you are asking for according to HTML5, as does:

   link rel=alternate href=feed.xml type=application/atom+xml
   link rel=alternate href=entry.xml

If an entry is something more special (e.g. if it is actually an edit 
interface), then register a new rel= value for it, e.g. rel=edit:

   link rel=feed href=feed.xml
   link rel=edit href=entry.xml

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] HTML syntax: space between empty attribute and /

2006-12-01 Thread Simon Pieters

Hi,

From: Ian Hickson [EMAIL PROTECTED]

   If an attribute using the empty attribute syntax is to be
   followed by another attribute or by one of the optional U+002F
   SOLIDUS (/) characters allowed in step 6 of the start tag
   syntax above, then there must be a space character separating
   the two.

 I don't see why it would be required to have a space between empty
 attributes and the slash.

This was an oversight. Fixed.


The entire text shouldn't have been removed. Empty attributes still need a 
space after them if they are followed by another attribute. Sorry for being 
unclear.


It should say:

  If an attribute using the empty attribute syntax is to be
  followed by another attribute then there must be a space
  character separating the two.

Regards,
Simon Pieters

_
Leta efter bilder på Red Hot Chili Peppers http://search.live.com/images/



Re: [whatwg] Editorial: dfn s/term given by the contents/term/

2006-12-01 Thread fantasai

Simon Pieters wrote:

Hi,

The dfn element:

  The dfn element represents the defining instance of a term.
  The paragraph, definition list group, or section that contains
  the dfn element contains the definition for the term given
  by the contents of the dfn element.

Given the definition of defining term two paragraphs later, the term 
of the dfn element is not always the contents of the element. I suggest 
replacing the above text with:


  The dfn element represents the defining instance of a term.
  The paragraph, definition list group, or section that contains
  the dfn element contains the definition for the _term_ of
  the dfn element.


I'd just strike the contents.

  definition for the term given by the dfn element.

~fantasai


Re: [whatwg] PaceEntryMediatype

2006-12-01 Thread James M Snell
I could but after the discussions this week I'm not sure its worth it.

Yes, everything can be done using different rel values; the content-type
thing is more just an annoyance than anything else. I'll just make sure
that I never link my Atom entry documents using alternate (even tho
that's what they are).

- James

Kyle Marvin wrote:
 [snip]
 Can you explain why you want this?   I'm not trying to be relentless I
 just want to make sure I'm not missing something important while pushing
 back.
 
 -- Kyle
 
 


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-12-01 Thread Mike Schinkel
Lachlan Hunt wrote:
 Mike Schinkel wrote:
  1.) I read the FAQ http://blog.whatwg.org/faq/ and it seemed to imply 
  that HTML 5 and XHTML where not at odds with each other?  Did I 
  misread that, because from comments on this thread I get the 
  impression that might not be the case.
  
  2.) A similar question, but is the goal for HTML5 and XHTML to slowly 
  converge, or is the goal for them to diverage?
 
 This issue was explained in detail in this recent blog entry.
 
 http://blog.whatwg.org/html-vs-xhtml
 

Thanks for the reference. I'll check it out.

-Mike Schinkel
http://www.mikeschinkel.com/blogs/
http://www.welldesignedurls.org/

 



Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-12-01 Thread Mike Schinkel
Ian Hickson wrote:
 
 On Thu, 30 Nov 2006, Mike Schinkel wrote:
  1.) I read the FAQ http://blog.whatwg.org/faq/ and it seemed to imply 
  that HTML 5 and XHTML where not at odds with each other?  Did I misread

  that, because from comments on this thread I get the impression that 
  might not be the case.
 
 They're just differently serialisations. One is for text/html, the other 
 for XML. You can use one or the other, it basically only depends on 
 whether you want to send it as text/html or not.
 

That is a good explanation, thank you.

Even though they are both serializations, the vast majority of people
producing HTML/XHTML are not doing it by serializing, they are doing it by
string concatonation and merging templates. Unfortunately, no matter how
much it's lamented that this is the wrong way to do it, it's not going to
change by a significant amount and hence it would seem to me to be the
enlightened thing to acknowledge and strive to converge HTML with XHTML over
time, as much as reasonably possible.

Another very beneficial thing would be to ensure there are reference
implementations of open source or public domain serializers for XHTML and
HTML as part of the spec in all major languages and platforms. That way
there would be a fighting chance that the next generation of web apps would
implement proper serialization and pipelines instead of reverting to string
concatonation because the other is just too hard. That way there is a
greater likelyhood of the next WordPress will be developed with a proper
architecture.

JMTCW anyway.

-Mike Schinkel
http://www.mikeschinkel.com/blogs/
http://www.welldesignedurls.org/




Re: [whatwg] lang vs. xml:lang; id vs. xml:id

2006-12-01 Thread Michel Fortin

Le 1 déc. 2006 à 11:44, Ian Hickson a écrit :


On Fri, 1 Dec 2006, Michel Fortin wrote:


Okay, so if I understand well, xml:lang in the spec refers to the  
lang
attribute in the xml namespace, not to the xml:lang attribute  
in the
null namespace that you get with the HTML parser. It makes sense  
from a

DOM perspective, but it's misleading from a markup perspective, so I
still think it should be clarified.


Could you propose some text?


What about adding at the end of this paragraph:

If both the xml:lang attribute and the lang attribute are set, user  
agents must use the xml:lang attribute, and the lang attribute must  
be ignored for the purposes of determining the element's language.


the following sentence:

Note that the xml:lang attribute can only be set via scripting for  
HTML documents, since the HTML parser does not handle namespaces.


I guess that new sentence is totally obvious when you've read the  
Terminology section, but I still think it's important because  
xml:lang is used a lot in XHTML1 documents served as text/html, and  
people will be referring to this part of the spec to know what  
browsers do about them so it ought to be clear.



Michel Fortin
[EMAIL PROTECTED]
http://www.michelf.com/




Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-12-01 Thread Ian Hickson
On Fri, 1 Dec 2006, Mike Schinkel wrote:
 
 Even though they are both serializations, the vast majority of people 
 producing HTML/XHTML are not doing it by serializing, they are doing it 
 by string concatonation and merging templates. Unfortunately, no matter 
 how much it's lamented that this is the wrong way to do it, it's not 
 going to change by a significant amount and hence it would seem to me to 
 be the enlightened thing to acknowledge and strive to converge HTML with 
 XHTML over time, as much as reasonably possible.

There is an underlying assumption here, namely that there would be 
something wrong with picking one or the other and just using that.

If you want to use XHTML, then use XHTML, send it with an XML MIME type, 
and be happy.

If you want to use HTML, then use HTML, send it with an HTML MIME type, 
and be happy.

You don't need to do one or the other. It's just up to you which you do. 
Neither is better or worse than the other. They are equivalent, neither is 
deprecated, they are both unambiguous, they are both strict, they will 
both have validators and they will both have tools that can be used to 
process them. There's no reason to try and do both.


 Another very beneficial thing would be to ensure there are reference 
 implementations of open source or public domain serializers for XHTML 
 and HTML as part of the spec in all major languages and platforms.

There will be tools available in due course. Right now it's still early 
days; the spec is in flux, so implementations would have to do a lot of 
work to keep track.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] xml:lang and xmlns in HTML

2006-12-01 Thread Michel Fortin

Le 1 déc. 2006 à 11:07, Ian Hickson a écrit :


On Fri, 1 Dec 2006, Michel Fortin wrote:


I wonder if xml:lang and xmlns couldn't be made legal in HTML.  
xml:lang

would simply become conformant in HTML as a synonym for the lang
attribute, it's already in the spec that it should get the correct
treatment anyway.


Except that wouldn't be backwards compatible since xml:lang= isn't
treated as a language attribute in legacy UAs.


Yes I see. At the time I thought the spec required xml:lang to work  
in HTML, because of the way xml:lang is mentioned in the section  
about the lang attribute. Now I see it's the lang attribute in the  
xml namespace that would work, not the xml:lang attribute HTML  
would have.


But I think the reverse could work: xml:lang cannot work in HTML, but  
lang (html:lang) do work in XHTML if I'm not mistaken (although it's  
non-conforming).




This would make it possible to have documents conformant with both
syntaxes at the same time.


I thought XHTML-sent-as-text/html had explained in painful detail why
that's not a desirable end goal. Why would we want this?


I don't want to send XHTML as text/html. I want to see if it's  
possible to have a common subset between HTML and XHTML at the markup  
level, so that someone can create a document that is conforming both  
with XHTML to HTML.


I'm not sure if this is desirable or not, that's why I was asking for  
opinions. I see that it may also be completely irrelevant, but I  
don't really know what to think.




This could also help reinforce the idea that it's the media type that
differentiate HTML from XHTML. It'd make many valid XHTML1  
documents out

there conformant with HTML5 with a mere modification to the doctype.


Not if they use things like ![CDATA[...]] or the empty element  
syntax on

non-void elements, or any number of other XMLisms.


Well, by out there I meant all the XHTML1 documents that are built  
for text/html, that validates and which don't use any feature that  
both parser can handle. This certainly does not include ![CDATA[...]].


Sorry if I wasn't clear; out there was certainly misnomer.



What do you think?


I don't think it's a goal for the two serialisations to have a common
subset.


That's fine with me.


Michel Fortin
[EMAIL PROTECTED]
http://www.michelf.com/




Re: [whatwg] xml:lang and xmlns in HTML

2006-12-01 Thread Ian Hickson
On Fri, 1 Dec 2006, Michel Fortin wrote:
 
 Yes I see. At the time I thought the spec required xml:lang to work in 
 HTML, because of the way xml:lang is mentioned in the section about the 
 lang attribute. Now I see it's the lang attribute in the xml 
 namespace that would work, not the xml:lang attribute HTML would have.

Right.


 But I think the reverse could work: xml:lang cannot work in HTML, but 
 lang (html:lang) do work in XHTML if I'm not mistaken (although it's 
 non-conforming).

Correct.


   This would make it possible to have documents conformant with both 
   syntaxes at the same time.
  
  I thought XHTML-sent-as-text/html had explained in painful detail why 
  that's not a desirable end goal. Why would we want this?
 
 I don't want to send XHTML as text/html. I want to see if it's possible 
 to have a common subset between HTML and XHTML at the markup level, so 
 that someone can create a document that is conforming both with XHTML to 
 HTML.

Ah. I think it is technically possible with the exception of the namespace 
declaration, but I don't know that it is a useful subset.

(I mean, it is technically possible today to create a document that is 
both an HTML4-compliant document and an XHTML1-compliant document at the 
same time, but again, that doesn't seem very useful.)


   This could also help reinforce the idea that it's the media type 
   that differentiate HTML from XHTML. It'd make many valid XHTML1 
   documents out there conformant with HTML5 with a mere modification 
   to the doctype.
  
  Not if they use things like ![CDATA[...]] or the empty element 
  syntax on non-void elements, or any number of other XMLisms.
 
 Well, by out there I meant all the XHTML1 documents that are built for 
 text/html, that validates and which don't use any feature that both 
 parser can handle. This certainly does not include ![CDATA[...]].

Ah, yes. To convert an Appendix-C-compliant document to HTML5, one would 
just need to change the DOCTYPE and drop the namespace declaration and one 
would be pretty close (there might be some other esoteric things to 
change, but probably not many). That should be reasonably easy, probably 
just a search-and-replace or a template change.


Cheers,
-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] PaceEntryMediatype

2006-12-01 Thread Ian Hickson
On Sat, 2 Dec 2006, Thomas Broyer wrote:
 2006/12/1, Ian Hickson:
  On Fri, 1 Dec 2006, Thomas Broyer wrote:
  
   A summary of my problem with HTML5's autodiscovery: - there 
   shouldn't be a 'rel' value for subscribability, subscribability is 
   a matter of whether and how an UA can process content from a 
   particular media type
 
  Agreed. The spec doesn't mention subscribing, just that rel=feed means 
  it's a syndication feed.
 
 And what is a syndication feed, if not something that's 
 subscribable?

 I mean, there is no definition of syndication feed, neither of feed 
 autodiscovery (what's the purpose of feed autodiscovery, if not to 
 subscribe to such feeds?)
 
 In that sense, I really do think the spec is mentionning subscribing.

Oh. If you just mean that you don't think there should be a way to say 
that a particular document is a syndication feed, then I disagree. I would 
assert that the popularity of feed readers such as Bloglines, Google 
Reader, and so forth, is evidence that many other people find this feature 
useful as well.


 With my proposal, existing content would still be found by feed 
 autodiscovery, it would just be semantically incorrect in many cases 
 (from an entry page, when linking to the feed containing the entry 
 with rel=alternate; the feed is not an alternate to the entry; the use 
 of rel=alternate was just a hack to display the orange icon).

So you're proposing making the hundreds of millions of existing instances 
of syndication feed links non-conforming?

That seems about equivalent to closing the barn door after the horse has 
bolted, as they say.


 [...]
 
 I hope I clarified my opinion.

Actually I'm even more confused now than before. Could you propose exact 
normative replacement text for the specification that would make you 
happy? In doing so, please consider these constraints:

 * We cannot define anything to do with the user interface, only the 
   meaning of the link relationships, because user agents must be allowed 
   to innovate in user interfaces (basically, only interoperability can 
   be ensured, not homogeneity).

 * We don't want to break existing practices. If something is 
   interoperably implemented and widely used, then it should continue to 
   work in the same way.

 * The specification should be kept as simple as possible.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] PaceEntryMediatype

2006-12-01 Thread Thomas Broyer

2006/12/1, Mark Baker:

Urgh, sorry for my tardiness; I'm falling behind on my reading.

On 11/30/06, Thomas Broyer wrote:
 I'd prefer basing autodiscovery on the media types and not at all on
 the relationships.

All a media type tells you (non-authoritatively too) is the spec you
need to interpret the document at the other end of the link.  That has
very little to do with the reasons that you might want to follow the
link, subscribe to it, etc..  Which is why you need a mechanism
independent from the media type.  Like link types.


See the mail I just sent in response to Ian.


Consider hAtom.  If you went by media types alone, you'd be confronted with;

link type=text/html href=hatom.html /

Not particularly useful for subscription (or anything else for that
matter) is it?


How does hatom.html relates to the current page? Is it an alternate?
is it a container (rel=up, rel=index)? why would I subscribe to
such a thing if I don't know what it is about?
(also, note that rel= is required for link elements).


This would be better;

link rel=feed type=text/html href=hatom.html /


It still doesn't tell me what it has to do with the page I'm looking at.

I do agree there is a problem in these cases, and that's why I
originally proposed keeping a rel=feed, but with a clear definition
as a relationship (opposed to a kind of resource I'm linking to).


Autodiscovery should ideally be based primarily on link types, and
only secondarily - as an optimization - on media types.  Even this
should work;

link rel=feed href=hatom.html /


As long as hatom.html is a feed where the current page is (or has
been) linked to as an item.
If you are already looking at hatom.html, your hAtom-aware browser
should already provide you with a subscribe to this page
link/button/etc.
If you can't describe the relationship between the current page and
hatom.html, there is little chance that this is a resource of interest
and that the person reading the page will subscribe to it (at least
without visiting it).

With rel=feed as a real relationship (à la rel=index),
autodiscovery can be (as it should have already been) based on media
types (am I able to subscribe to such a thing?) *or* rel=feed, with
an equal priority.
If it appears than my proposed rel=feed really is identical to
rel=index, then a new mean should be found (e.g. a new attribute
link rel=index href=hatom.html type=text/html subscribable)

Saying this is something you can subscribe to (it's a feed) is not
talking about relationships. On the contrary, saying this is an
'index' and it incidentally is something you can subscribe to (it's a
feed; either by using the 'type' attribute an hypothetical
'subscribable' attribute) is.

--
Thomas Broyer


Re: [whatwg] xml:lang and xmlns in HTML

2006-12-01 Thread Robert Sayre

On 12/1/06, Ian Hickson [EMAIL PROTECTED] wrote:


 What do you think?

I don't think it's a goal for the two serialisations to have a common
subset.


I want to cut and paste MathML and SVG and other things into my web
pages. I think I understand the difference between the XML and HTML5
serializations better than the average person, but I still want to do
it.

I don't care about XML, or what the subset ends up being. I want to
use some existing XML application vocabularies in an HTML5 DOM.

There are certainly risks inherent in endorsing this, but I think most
of them are social. Are we worried that this will result in an unruly
population of extensions and a Tower of Babel problem?

--

Robert Sayre


Re: [whatwg] xml:lang and xmlns in HTML

2006-12-01 Thread Henri Sivonen

On Dec 2, 2006, at 01:14, Ian Hickson wrote:


To convert an Appendix-C-compliant document to HTML5, one would
just need to change the DOCTYPE and drop the namespace declaration  
and one

would be pretty close (there might be some other esoteric things to
change, but probably not many).


Bimorphic content models instead of %Flow.

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] xml:lang and xmlns in HTML

2006-12-01 Thread Ian Hickson
On Fri, 1 Dec 2006, Robert Sayre wrote:
 
 I want to cut and paste MathML and SVG and other things into my web 
 pages.

Then you'll have to use the XML variant and the XML MIME type.

(Similarly, if you want to use JavaScript code snippets, you have to use 
JavaScript and can't use, say, C++ or VBScript.)


 I don't care about XML, or what the subset ends up being. I want to use 
 some existing XML application vocabularies in an HTML5 DOM.

The only ways to use namespaces outside of HTML right now with HTML5 DOMs 
is to either use the XML serialisation, or use a lot of very verbose 
JavaScript with DOM manipulation.


 There are certainly risks inherent in endorsing this, but I think most 
 of them are social. Are we worried that this will result in an unruly 
 population of extensions and a Tower of Babel problem?

Namespaces aren't supported by the tag soup processors. If you want to 
use namespaces, you have to use XML.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] xml:lang and xmlns in HTML

2006-12-01 Thread Robert Sayre

On 12/1/06, Ian Hickson [EMAIL PROTECTED] wrote:

On Fri, 1 Dec 2006, Robert Sayre wrote:

 I want to cut and paste MathML and SVG and other things into my web
 pages.

Then you'll have to use the XML variant and the XML MIME type.


Why? I don't care if features that rely on XML serialization break.



The only ways to use namespaces outside of HTML right now with HTML5 DOMs
is to either use the XML serialisation, or use a lot of very verbose
JavaScript with DOM manipulation.

...

Namespaces aren't supported by the tag soup processors. If you want to
use namespaces, you have to use XML.



I don't want to use namespaces. I want to use an xmlns attribute.

--

Robert Sayre


Re: [whatwg] Valid Unicode

2006-12-01 Thread Elliotte Harold

Henri Sivonen wrote:


6. Are noncharacters U+FDD0..U+FDEF allowed (?)
7. Are the noncharacters from the last two characters of each plane 
allowed (?)


I don't have particularly strong feelings here. Putting those characters 
is HTML is a bad idea, but allowing them is not a problem for HTML5 to 
XHTML5 conversion and they aren't a common problem like C1 controls.


FFFE and  are specifically forbidden by XML so they should probably 
be forbidden here too. I think the others are allowed.


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/


Re: [whatwg] xml:lang and xmlns in HTML

2006-12-01 Thread Ian Hickson
On Fri, 1 Dec 2006, Robert Sayre wrote:
 On 12/1/06, Ian Hickson [EMAIL PROTECTED] wrote:
  On Fri, 1 Dec 2006, Robert Sayre wrote:
  
   I want to cut and paste MathML and SVG and other things into my web
   pages.
  
  Then you'll have to use the XML variant and the XML MIME type.
 
 Why?

Because MathML and SVG are XML languages. I gave an apt analogy in my 
previous e-mail: if you want to use JavaScript code snippets, you have to 
use JavaScript and can't use, say, C++ or VBScript. If you want to use XML 
languages, then you have to use XML and can't use other formats like HTML 
or n3 or JSON.


 I don't care if features that rely on XML serialization break.

I *really* don't understand what you're asking for. You want to be able to 
use some features but don't care if they work or not?


  The only ways to use namespaces outside of HTML right now with HTML5 DOMs
  is to either use the XML serialisation, or use a lot of very verbose
  JavaScript with DOM manipulation.
 ...
  Namespaces aren't supported by the tag soup processors. If you want to
  use namespaces, you have to use XML.
 
 I don't want to use namespaces. I want to use an xmlns attribute.

The only possible reasoning I could see for such a strange request would 
be if you intended to try and parse HTML documents using an XML parser. 
But then that makes no sense -- if you wanted to use an XML parser, then 
you would just use XML. Why would you not want to use XML if you wanted to 
use an XML parser?

Could you explain the use case for xmlns= if you don't actually want 
namespaces? It may be that you are assuming a solution to a problem I 
don't understand, and that there may be some other solution to the problem 
that makes more sense.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-12-01 Thread Mike Schinkel
Lachlan Hunt wrote:
 HTML and XML have significantly different parsing requirements 
 and they absolutely must be treated as significantly different 
 file formats.  Any attempt to treat them as the same format is 
 an extremely bad idea.
 ...
 This is why the spec is defined in terms of the DOM, so that 
 there can be both HTML and XHTML serialisations of the same 
 document, rather than defining that both serialisations are the 
 same syntax.

But please take into consideration that almost nobody writes web pages using
a DOM; they write web pages using text editors and dynamically using string
concatonation. As such there is great value for users in having them be as
similar as possible. If they converge, it will accelerate chaos on the web.

-Mike Schinkel
http://www.mikeschinkel.com/blogs/
http://www.welldesignedurls.org/