Re: [whatwg] 9.1.2.1: trailing slash and atheism

2006-12-02 Thread Christian Schmidt

Charles Iliya Krempeaux wrote:
Sometimes web developers parse (non-XML) HTML with an XML parser 
because it's the tool they have on hand.


Consider a PHP developer trying to analyse an HTML page.

If a PHP developer wants to analyse an HTML page; that developer may 
try to use SimpleXML  because that's what

they have on hand and know how to use.  There's no SimpleHTML
available in PHP.

And while none of this is certainly our fault.  This is a situation 
some web developers are going to run into.  (What else are they going

 to use?)


PHP developers can parse HTML using DOMDocument::loadHTML(). If they
want, they can then convert the DOMDoucment to SimpleXML:

$doc = new DOMDocument();
$doc->loadHTML('http://www.w3.org/TR/html4/loose.dtd";>Foo
   Foobar');
$simpleXml = simplexml_import_dom($doc);
print $simpleXml->head->title;


Christian



Re: [whatwg] 9.1.2.1: trailing slash and atheism

2006-12-01 Thread Mike Schinkel
Ian Hickson wrote:
>>  Charles Iliya Krempeaux wrote:
>>  > Sometimes web developers parse (non-XML) HTML with an XML parser 
>>  > because it's the tool they have on hand.
>>  The solution to this is to provide better tools (which is already
happening, 
>>  by the way -- if people want to join the effort, #whatwg on Freenode is 
>>  a good place to start), not to >>  make the language slightly more 
>>  compatible with a fundamentally broken approach.

That approach of saing "better tools should be provided" will work if and
only if the people doing the specifying also *ensure* that there are tested,
working tools freely available in the public domain on all major platforms,
and that those tools are easy enough for the lay person to use in (almost)
all contexts.

-Mike Schinkel
http://www.mikeschinkel.com/blogs/
http://www.welldesignedurls.org/




Re: [whatwg] 9.1.2.1: trailing slash and atheism

2006-12-01 Thread Ian Hickson
On Fri, 1 Dec 2006, Charles Iliya Krempeaux wrote:
> > 
> > This section has nothing to do with XML. If the document was parsed by 
> > an XML parser, then there are much bigger problems afoot, such as MIME 
> > type mislabelling, or a faulty UA.
> 
> Sometimes web developers parse (non-XML) HTML with an XML parser because 
> it's the tool they have on hand.

The solution to this is to provide better tools (which is already 
happening, by the way -- if people want to join the effort, #whatwg on 
Freenode is a good place to start), not to make the language slightly more 
compatible with a fundamentally broken approach.

Some people use books as paperweights. What you are asking is equivalent 
to asking bookmakers to make their books heavier and less prone to opening 
in air currents, because then they would be better paperweights.


> If a PHP developer wants to analyse an HTML page; that developer may try 
> to use SimpleXML  because that's what they 
> have on hand and know how to use.  There's no SimpleHTML available in 
> PHP.

As part of HTML5, we will have to provide these tools. Writing a spec for 
how to parse HTML was the first step.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] 9.1.2.1: trailing slash and atheism

2006-12-01 Thread Charles Iliya Krempeaux

Hello Ian,

On 12/1/06, Ian Hickson <[EMAIL PROTECTED]> wrote:


On Fri, 1 Dec 2006, Elliotte Harold wrote:
>
> 9.1.2.1 states:
>
> Then, if the element is one of the void elements, then there may be a
single
> U+002F SOLIDUS character. This character has no effect [...]
>
> The second sentence is false [...] I suggest rewriting as follows:
>
> This character has no effect when the document is parsed by an HTML5
parser.

That's redundant. Parsing a document using this syntax with anything other
than an HTML5 parser would be non-conforming.


> However, if the document when parsed by an XML parser, the trailing
> slash converts the tag into an empty-element tag, and thereby makes an
> otherwise malformed element well-formed.

This section has nothing to do with XML. If the document was parsed by an
XML parser, then there are much bigger problems afoot, such as MIME type
mislabelling, or a faulty UA.



Sometimes web developers parse (non-XML) HTML with an XML parser because
it's the tool they have on hand.

Consider a PHP developer trying to analyse an HTML page.

If a PHP developer wants to analyse an HTML page; that developer may try to
use SimpleXML  because that's what they have on
hand and know how to use.  There's no SimpleHTML available in PHP.

And while none of this is certainly our fault.  This is a situation some web
developers are going to run into.  (What else are they going to use?)


See ya

--
   Charles Iliya Krempeaux, B.Sc.

   charles @ reptile.ca
   supercanadian @ gmail.com

   developer weblog: http://ChangeLog.ca/


Re: [whatwg] 9.1.2.1: trailing slash and atheism

2006-12-01 Thread Ian Hickson
On Fri, 1 Dec 2006, Elliotte Harold wrote:
>
> 9.1.2.1 states:
> 
> Then, if the element is one of the void elements, then there may be a single
> U+002F SOLIDUS character. This character has no effect [...]
> 
> The second sentence is false [...] I suggest rewriting as follows:
> 
> This character has no effect when the document is parsed by an HTML5 parser.

That's redundant. Parsing a document using this syntax with anything other 
than an HTML5 parser would be non-conforming.


> However, if the document when parsed by an XML parser, the trailing 
> slash converts the tag into an empty-element tag, and thereby makes an 
> otherwise malformed element well-formed.

This section has nothing to do with XML. If the document was parsed by an 
XML parser, then there are much bigger problems afoot, such as MIME type 
mislabelling, or a faulty UA.

Cheers,
-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] 9.1.2.1: trailing slash and atheism

2006-12-01 Thread Anne van Kesteren
On Fri, 01 Dec 2006 13:09:24 +0100, Elliotte Harold  
<[EMAIL PROTECTED]> wrote:
This character has no effect when the document is parsed by an HTML5  
parser. However, if the document when parsed by an XML parser, the  
trailing slash converts the tag into an empty-element tag, and thereby  
makes an otherwise malformed element well-formed.


You're still not getting it, do you?


--
Anne van Kesteren




Re: [whatwg] 9.1.2.1: trailing slash and atheism

2006-12-01 Thread James Graham

Elliotte Harold wrote:

This character has no effect when the document is parsed by an HTML5 
parser. However, if the document when parsed by an XML parser, the 
trailing slash converts the tag into an empty-element tag, and thereby 
makes an otherwise malformed element well-formed.


If you're trying to parse a HTML5 document with an XML parser you're doing 
something really screwy anyway.


--
"Eternity's a terrible thought. I mean, where's it all going to end?"
 -- Tom Stoppard, Rosencrantz and Guildenstern are Dead


[whatwg] 9.1.2.1: trailing slash and atheism

2006-12-01 Thread Elliotte Harold

9.1.2.1 states:

Then, if the element is one of the void elements, then there may be a 
single U+002F SOLIDUS character. This character has no effect except to 
appease the markup gods. As this character is therefore just a symbol of 
faith, atheists should omit it.


The second sentence is false, and also likely to cause unnecessary 
conflict with fundamentalists who don't understand markup and don't get 
the joke. But mostly it's false. I suggest rewriting as follows:


This character has no effect when the document is parsed by an HTML5 
parser. However, if the document when parsed by an XML parser, the 
trailing slash converts the tag into an empty-element tag, and thereby 
makes an otherwise malformed element well-formed.


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/