Re: [whatwg] Proposal for window.DocumentType.prototype.toString

2012-10-29 Thread Ian Hickson
On Mon, 29 Oct 2012, Johan Sundstr�m wrote:
> 
> Serializing a complete HTML document DOM to a string is surprisingly 
> hard in javascript. As a fairly seasoned javascript hacker I figured 
> this might do it:
> 
>   document.doctype + document.documentElement.outerHTML
>
> It doesn't. No browser has a useful window.DocumentType.prototype that 
> returns either the original document's  before parsing � 
> or a semantically equivalent post-parsing one.

If you know the document is always going to be in the no-quirks mode, then 
you can just stick "" at the start. If you need to be able 
to tell what the mode is but are ok with ignoring the "limited quirks" 
mode, then you can use document.compatMode to pick whether to use that 
string or none, as in:

   (document.compatMode == 'CSS1Compat' ? '' : '') +
   document.documentElement.outerHTML

That will drop any comment nodes around the root element, in case that 
matters. If you want to get the actual DOCTYPE strings, you can make a 
simple serialisation function for doctype nodes that uses the three 
attributes on that object to string together the full thing (much as you 
do in the polyfill you mentioned).


> I believe only Firefox implements "internalSubset" today

Since the "internal subset" has no meaning in text/html, that's ok if your 
goal is just to be semantically equivalent.


> The most useful implementation would IMO be a native one that 
> reproducing the doctype, as it was formatted in the source document.

What's your use case, exactly?


On Mon, 29 Oct 2012, Boris Zbarsky wrote:
> 
> I thought there were plans to put innerHTML on Document.  Did that go 
> nowhere?

Lack of implementor interest killed it a while ago.


On Mon, 29 Oct 2012, Ojan Vafai wrote:
> On Mon, Oct 29, 2012 at 6:17 PM, Boris Zbarsky  wrote:
> >
> > I thought there were plans to put innerHTML on Document.  Did that go 
> > nowhere?
> 
> There were plans to put in on DocumentFragment.

That was a different plan, but yes, there have also been proposals to do 
that. This was in the context of templates; a better solution to which has 
since been worked on in public-webapps.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] Proposal for window.DocumentType.prototype.toString

2012-10-29 Thread Ojan Vafai
On Mon, Oct 29, 2012 at 6:17 PM, Boris Zbarsky  wrote:

> On 10/29/12 8:58 PM, Johan Sundström wrote:
>
>> Serializing a complete HTML document DOM to a string is surprisingly
>> hard in javascript.
>>
>
> I thought there were plans to put innerHTML on Document.  Did that go
> nowhere?


There were plans to put in on DocumentFragment. But IIRC no other browser
vendors voiced an interest and Hixie was opposed because he thought it
would encourage people to do more string-based DOM building. The WebKit
patch for this floundered as a result. I still think it's a good idea.


Re: [whatwg] Proposal for window.DocumentType.prototype.toString

2012-10-29 Thread Boris Zbarsky

On 10/29/12 8:58 PM, Johan Sundström wrote:

Serializing a complete HTML document DOM to a string is surprisingly
hard in javascript.


I thought there were plans to put innerHTML on Document.  Did that go 
nowhere?



As a fairly seasoned javascript hacker I figured
this might do it:

   document.doctype + document.documentElement.outerHTML


This seems lossy in many cases (most obviously: when the HTML uses 
conditional comments, though there are also various XHTML-specific issues).



The most useful implementation would IMO be a native one
that reproducing the doctype, as it was formatted in the source
document.


That might be worth doing independent of the serialization issue.

-Boris


[whatwg] Proposal for window.DocumentType.prototype.toString

2012-10-29 Thread Johan Sundström
Hi everybody!

Serializing a complete HTML document DOM to a string is surprisingly
hard in javascript. As a fairly seasoned javascript hacker I figured
this might do it:

  document.doctype + document.documentElement.outerHTML

It doesn't. No browser has a useful window.DocumentType.prototype that
returns either the original document's  before parsing –
or a semantically equivalent post-parsing one. Google Chrome shows one
in its devtools, but seems not to export some way of getting at it to
programmers.

My proposal is we specify this more useful behaviour for
javascript-running browsers, so it does become as simple as above. A
rough sketch of how a polyfill might implement the latter
window.DocumentType.prototype.toString:

  https://gist.github.com/3977584

Even as a polyfill, the above is rather limited, though:  I believe
only Firefox implements "internalSubset" today, and probably only in
XML contexts. The most useful implementation would IMO be a native one
that reproducing the doctype, as it was formatted in the source
document.

Thoughts?

-- 
 / Johan Sundström, http://ecmanaut.blogspot.com/


Re: [whatwg] URL: file: URLs

2012-10-29 Thread Boris Zbarsky

On 10/29/12 10:53 AM, Anne van Kesteren wrote:

But at that point in a URL you cannot have a path. A path starts with
a slash after the host.


The point is that on Windows, Gecko parses file://c:/something as 
file:///c:/something


As in, it's an exception to the general "if there are two slashes after 
the "file:" then the next thing is a host rule.



I suppose, I would hate it though for new URL(...) to depend on the platform.


I'm not sure there are great solutions here.  :(

-Boris


Re: [whatwg] URL: file: URLs

2012-10-29 Thread Anne van Kesteren
On Mon, Oct 29, 2012 at 3:13 PM, Boris Zbarsky  wrote:
> On 10/29/12 5:00 AM, Anne van Kesteren wrote:
>> Maybe I should introduce a "file host state" that supports colons in
>> the host name (or special case the "host state" further, but the
>> former seems cleaner).
>
> I don't think that's particularly desirable.  The "c:" is totally part of
> the path; treating it otherwise would just be confusing.  Imo.

But at that point in a URL you cannot have a path. A path starts with
a slash after the host. Especially if you want file://test/ to parse
with test being the host.


>> Most browsers seem to fail currently on input
>> such as "file://c:/" but this is on a Mac
>
> Yes, doing that on a Mac would just be wrong

I suppose, I would hate it though for new URL(...) to depend on the platform.


-- 
http://annevankesteren.nl/


Re: [whatwg] URL: file: URLs

2012-10-29 Thread Boris Zbarsky

On 10/29/12 5:00 AM, Anne van Kesteren wrote:

But note that it would be a bit odd of file://c:/ claimed to have a host of
"c" with a default port or some such...


Maybe I should introduce a "file host state" that supports colons in
the host name (or special case the "host state" further, but the
former seems cleaner).


I don't think that's particularly desirable.  The "c:" is totally part 
of the path; treating it otherwise would just be confusing.  Imo.



Most browsers seem to fail currently on input
such as "file://c:/" but this is on a Mac


Yes, doing that on a Mac would just be wrong


I would prefer having the parsing be consistent though.


You mean across Windows and non-Windows?  I'm not sure that's viable.

-Boris



Re: [whatwg] URL: file: URLs

2012-10-29 Thread Anne van Kesteren
On Sun, Oct 28, 2012 at 6:51 PM, Boris Zbarsky  wrote:
> Same as the comment I quoted?  As same as something else?

Same as you quoted.


> Well, the Gecko parser preserves the host at this stage assuming the URI was
> correctly formatted with a host.  Again:
>
>   blah://foo/bar => blah://foo/bar
>
> The interesting things happen when you have 0, 1, or 3 slashes between ':'
> and "foo".  The handling of "foo" after this point is a separate issue.

Those are handled the same as in Gecko (also matches Safari I think,
Chrome strips are starting slashes (like if you have four), but I did
not copy that).


> In Gecko, it's part of URL parsing.  More precisely, it's part of the
> normalization performed as part of constructing a "URL" object from a
> string.  Since this is also how we parse URLs, it's effectively all part of
> the package.
>
> But note that it would be a bit odd of file://c:/ claimed to have a host of
> "c" with a default port or some such...

Maybe I should introduce a "file host state" that supports colons in
the host name (or special case the "host state" further, but the
former seems cleaner). Most browsers seem to fail currently on input
such as "file://c:/" but this is on a Mac so maybe that's the
difference. I would prefer having the parsing be consistent though.


> 7 and 8 are not, though at some point we'll need to define equality
> comparisons anyway.

Yeah, I guess at some point someone would need to write a processing
file: URLs specification (for post-parsing operations). On the other
hand, it's not entirely clear to me that needs to be interoperable.


-- 
http://annevankesteren.nl/