subject:"\[whatwg\] New URL Standard"

Re: [whatwg] New URL Standard

2012-10-06 Thread Glenn Maynard

On Tue, Sep 25, 2012 at 10:25 PM, Ian Hickson  wrote:

>  You could even make that work, by having a special method for appending a
> new key/value pair, and just not making it accessible.
>

Right, other access methods, like this or a classList-like array, can
always be added later.  (Actually, key/value pairs appended like this would
still be accessible with Tab's suggestion, it's just the resulting key
order that it doesn't expose.)

-- 
Glenn Maynard

Re: [whatwg] New URL Standard

2012-09-25 Thread Ian Hickson

On Tue, 25 Sep 2012, Glenn Maynard wrote:
> 
> What this doesn't allow is creating things like "a=1&b=2&a=3".  You can 
> create "a=1&a=2&b=3" (url.query.a = ["1","2"]; url.query.b = "3"), but 
> there's no way to split the keys (a, b, a).  This is the limitation we 
> were really talking about.  This seems unlikely to be a real problem, 
> and in the unlikely case where it's really needed, it seems fine to 
> require people to just fall back on formatting the query string 
> themselves and assign to url.search.

You could even make that work, by having a special method for appending a 
new key/value pair, and just not making it accessible.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] New URL Standard

2012-09-25 Thread Boris Zbarsky


On 9/25/12 11:15 PM, Glenn Maynard wrote:

What this doesn't allow is creating things like "a=1&b=2&a=3"


Ah.  That should be relatively unlikely (though forms with checkboxes in 
them can in fact lead to query strings like that).


-Boris

Re: [whatwg] New URL Standard

2012-09-25 Thread Glenn Maynard

On Tue, Sep 25, 2012 at 9:53 PM, Boris Zbarsky  wrote:

> On 9/25/12 10:36 PM, Glenn Maynard wrote:
>
>> You usually don't care about the resulting order in that case, right?
>>
>
> It's not uncommon for servers to depend on a particular order of
> parameters in the query string and totally fail when the ordering is
> different.  Especially the sort of servers that have a .exe for their CGI
> instead of using an off-the-shelf CGI library.
>
>
>  When do you care about being able to specifically create (or
>> distinguish) "a=1&b=2" vs. "b=2&a=1"
>>
>
> Whenever the server will barf on one of them?  ;)
>

It's easy enough to allow creating a specific ordering of individual items,
by guaranteeing that when a key is assigned to the object, if that key
didn't already exist in the query, it will be added to the end.  That means
you can say

url.query.x = '1';
url.query.y = '2';

vs.

url.query.y = '2';
url.query.x = '1';

to create "x=1&y=2" and "y=2&x=1", respectively.  That's the behavior I'd
expect anyway.  (If the key already existed, it should replace it in its
previous position, of course, not bump it to the end.)

What this doesn't allow is creating things like "a=1&b=2&a=3".  You can
create "a=1&a=2&b=3" (url.query.a = ["1","2"]; url.query.b = "3"), but
there's no way to split the keys (a, b, a).  This is the limitation we were
really talking about.  This seems unlikely to be a real problem, and in the
unlikely case where it's really needed, it seems fine to require people to
just fall back on formatting the query string themselves and assign to
url.search.

-- 
Glenn Maynard

Re: [whatwg] New URL Standard

2012-09-25 Thread Boris Zbarsky


On 9/25/12 10:36 PM, Glenn Maynard wrote:

You usually don't care about the resulting order in that case, right?


It's not uncommon for servers to depend on a particular order of 
parameters in the query string and totally fail when the ordering is 
different.  Especially the sort of servers that have a .exe for their 
CGI instead of using an off-the-shelf CGI library.



When do you care about being able to specifically create (or
distinguish) "a=1&b=2" vs. "b=2&a=1"


Whenever the server will barf on one of them?  ;)

-Boris

Re: [whatwg] New URL Standard

2012-09-25 Thread Glenn Maynard

On Tue, Sep 25, 2012 at 9:27 PM, Boris Zbarsky  wrote:

> On 9/25/12 10:13 PM, Glenn Maynard wrote:
>
>> The obvious use case is constructing a URI with a given query by
>> hand, right?
>>
>> If you already have the "a=1&b=2" string, you can just assign it to
>> .search and not use the prepared-query-parameters interface at all.
>>
>
> I was thinking more like you have the arrays ["a", "b"] (hardcoded) and
> [1, 2] (provided by user).
>

You usually don't care about the resulting order in that case, right?
You'd just say something like

assert(key_names.length == user_data.length); // ["a", "b"].length == [1,
2].length
for(var i = 0; i < user_data.length; ++i)
url.query[key_names[i]] = ]user_data[i];

When do you care about being able to specifically create (or distinguish)
"a=1&b=2" vs. "b=2&a=1" (or, a bit trickier, "a=1&b=2&a=3")?

-- 
Glenn Maynard

Re: [whatwg] New URL Standard

2012-09-25 Thread Boris Zbarsky


On 9/25/12 10:13 PM, Glenn Maynard wrote:

The obvious use case is constructing a URI with a given query by
hand, right?

If you already have the "a=1&b=2" string, you can just assign it to
.search and not use the prepared-query-parameters interface at all.


I was thinking more like you have the arrays ["a", "b"] (hardcoded) and 
[1, 2] (provided by user).


-Boris

Re: [whatwg] New URL Standard

2012-09-25 Thread Glenn Maynard

On Tue, Sep 25, 2012 at 8:36 PM, Boris Zbarsky  wrote:

> On 9/25/12 6:53 PM, Glenn Maynard wrote:
>
>> (Of course, a separate method could exist to get access to the underlying
>> order, if and when real use cases turn up that actually need it, and it's
>> not unlikely that there are use cases--but so far they haven't been
>> raised.
>>
>
> The obvious use case is constructing a URI with a given query by hand,
> right?
>

If you already have the "a=1&b=2" string, you can just assign it to .search
and not use the prepared-query-parameters interface at all.

-- 
Glenn Maynard

Re: [whatwg] New URL Standard

2012-09-25 Thread Boris Zbarsky


On 9/25/12 6:53 PM, Glenn Maynard wrote:

(Of course, a separate method could exist to get access to the underlying
order, if and when real use cases turn up that actually need it, and it's
not unlikely that there are use cases--but so far they haven't been
raised.


The obvious use case is constructing a URI with a given query by hand, 
right?


-Boris

Re: [whatwg] New URL Standard

2012-09-25 Thread Glenn Maynard

On Tue, Sep 25, 2012 at 5:14 PM, David Sheets  wrote:

> Looking up keys is easy in an association list. Filtering the list
> retains ordering. Appending to the list is well-defined. Folding into
> a dictionary is trivial and key merging can be defined according to
> the author's URL convention.
>

I'd suggest writing out what you mean in JavaScript or JS-like pseudocode,
demonstrating what it would actually look like to scripts and how it would
be used.  It's the quickest way to get API ideas across.

 > The right approach is probably to expose the results in an object-like
> form,
> > as Tab suggests, but to store the state internally in a list-like format,
> > with modifications defined in terms of mutations to the list.
>
> This sounds more complicated to implement while maintaining
> invariants. A dictionary with an associated total order is an
> association list.
>

I think it's pretty straightforward both to specify and to implement.  Of
course, implementations can use any internal data structure they like as
long as the end result is the same.

 Why hide the order?
>

Because the natural JS interface, object-like access, doesn't allow it.  If
you think there's an API with similar convenience to an object and natural
usage in the language, then feel free to suggest it as I described above.

(Of course, a separate method could exist to get access to the underlying
order, if and when real use cases turn up that actually need it, and it's
not unlikely that there are use cases--but so far they haven't been
raised.  There's nothing wrong with exposing multiple API "views" into the
same data set, when they have clearly distinct goals and attempts to meet
both sets of goals with the same API fail.)

 >> Like: query = new URL("?a=b&c=d"); query.query["a"] = "x";
> >> query.toString() == "?a=x&c=d";
>
> > That won't work, since "?a=b&c=d" isn't a valid URL.
>
> "?a=b&c=d" is a valid URI reference. @href="?a=b&c=d" is valid.
>

It's not a valid *absolute* URL, which is what you used above.  You can
sidestep this either by prefixing it to make it into a valid URL (as I
suggested) or by specifying a base URL; they're both pretty much equivalent
here.

 This is a use case for parsing without composed relative resolution.
>

Maybe, but that's a pretty complicated approach for this use case.

(To summarize the mechanism he's referring to, as I understand it: the
ability to use this API to parse, modify and output relative URLs without
resolving them to a base URL at all.)

-- 
Glenn Maynard

Re: [whatwg] New URL Standard

2012-09-25 Thread Alexandre Morgaut


On 26 sept. 2012, at 00:14, David Sheets wrote:

> On Tue, Sep 25, 2012 at 2:13 PM, Glenn Maynard  wrote:
>> On Mon, Sep 24, 2012 at 7:18 PM, David Sheets  wrote:
>>>
>
>> The right approach is probably to expose the results in an object-like form,
>> as Tab suggests, but to store the state internally in a list-like format,
>> with modifications defined in terms of mutations to the list.

Isn't it what does the Web Storage API? In which each key can be found by an 
index using the key() method:

http://www.w3.org/TR/webstorage/#dom-storage-key

My concern is just that "key" should probably be named "getKey" to avoid name 
collision with parameter names




Alexandre Morgaut
Wakanda Community Manager

4D SAS
60, rue d'Alsace
92110 Clichy
France

Standard : +33 1 40 87 92 00
Email :alexandre.morg...@4d.com
Web :  www.4D.com

Re: [whatwg] New URL Standard

2012-09-25 Thread David Sheets

On Tue, Sep 25, 2012 at 2:13 PM, Glenn Maynard  wrote:
> On Mon, Sep 24, 2012 at 7:18 PM, David Sheets  wrote:
>>
>> Always. The appropriate interface is (string * string?) list. Id est,
>>
>> an association list of keys and nullable values (null is
>> key-without-value and empty string is empty-value). If you prefer to
>> not use a nullable value and don't like tuple representations in JS,
>> you could use type: string list list
>>
>> i.e.
>>
>>
>> [["key_without_value"],[""],["key","value"],[],["numbers",1,2,3,4],["",""],["","",""]]
>
>
> This isn't an appropriate interface.  It's terrible for 99.9% of use cases,
> where you really want dictionary-like access.

This is the direct representation of the query string key-value convention.

Looking up keys is easy in an association list. Filtering the list
retains ordering. Appending to the list is well-defined. Folding into
a dictionary is trivial and key merging can be defined according to
the author's URL convention.

> The right approach is probably to expose the results in an object-like form,
> as Tab suggests, but to store the state internally in a list-like format,
> with modifications defined in terms of mutations to the list.

This sounds more complicated to implement while maintaining
invariants. A dictionary with an associated total order is an
association list.

> That is, parsing "a=1&b=2&a=3" would result in an internal representation
> like [('a', '1'), ('b', '2'), ('a', '3')].  When viewed from script, you see
> {a: ['1', '3'], 'b': ['2']}.  If you serialize it right back to a URL the
> internal representation is unchanged, so the original order is preserved.
> The mutation algorithms can then do their best to preserve the list as
> reasonably as they can (eg. assigning query.a = ['5', '6'] would remove all
> 'a' keys, then insert items at the location of the first removed item, or
> append if there were none).

Why hide the order?

>> Is this not already supported by creating a new URL which contains
>> only a relative query part?
>>
>> Like: query = new URL("?a=b&c=d"); query.query["a"] = "x";
>> query.toString() == "?a=x&c=d";
>>
>> Why is a new interface necessary?
>
>
> That won't work, since "?a=b&c=d" isn't a valid URL.

"?a=b&c=d" is a valid URI reference. @href="?a=b&c=d" is valid.

> The invalid flag will
> be set, so the change to .query will be a no-op, and .href (presumably what
> toString will invoke) would return the original URL, "?a=b&c=d", not
> "?a=x&c=d".  You'd need to do something like:
>
> var query = new URL("http://example.com?"; + url.hash);
> query.query.a = "x";
> url.hash = query.search.slice(1); // remove the leading "?"
>
> That's awkward, but maybe it's good enough.

This is a use case for parsing without composed relative resolution.

Re: [whatwg] New URL Standard

2012-09-25 Thread Glenn Maynard

On Mon, Sep 24, 2012 at 7:18 PM, David Sheets  wrote:

>  Always. The appropriate interface is (string * string?) list. Id est,
>
an association list of keys and nullable values (null is
> key-without-value and empty string is empty-value). If you prefer to
> not use a nullable value and don't like tuple representations in JS,
> you could use type: string list list
>
> i.e.
>
>
> [["key_without_value"],[""],["key","value"],[],["numbers",1,2,3,4],["",""],["","",""]]
>

This isn't an appropriate interface.  It's terrible for 99.9% of use cases,
where you really want dictionary-like access.

The right approach is probably to expose the results in an object-like
form, as Tab suggests, but to store the state internally in a list-like
format, with modifications defined in terms of mutations to the list.

That is, parsing "a=1&b=2&a=3" would result in an internal representation
like [('a', '1'), ('b', '2'), ('a', '3')].  When viewed from script, you
see {a: ['1', '3'], 'b': ['2']}.  If you serialize it right back to a URL
the internal representation is unchanged, so the original order is
preserved.  The mutation algorithms can then do their best to preserve the
list as reasonably as they can (eg. assigning query.a = ['5', '6'] would
remove all 'a' keys, then insert items at the location of the first removed
item, or append if there were none).

 Is this not already supported by creating a new URL which contains
> only a relative query part?
>
> Like: query = new URL("?a=b&c=d"); query.query["a"] = "x";
> query.toString() == "?a=x&c=d";
>
> Why is a new interface necessary?
>

That won't work, since "?a=b&c=d" isn't a valid URL.  The invalid flag will
be set, so the change to .query will be a no-op, and .href (presumably what
toString will invoke) would return the original URL, "?a=b&c=d", not
"?a=x&c=d".  You'd need to do something like:

var query = new URL("http://example.com?"; + url.hash);
query.query.a = "x";
url.hash = query.search.slice(1); // remove the leading "?"

That's awkward, but maybe it's good enough.

-- 
Glenn Maynard

Re: [whatwg] New URL Standard

2012-09-25 Thread Anne van Kesteren

On Tue, Sep 25, 2012 at 8:20 PM, David Sheets  wrote:
> On Tue, Sep 25, 2012 at 8:03 AM, Anne van Kesteren  wrote:
>> FWIW, given that browsers happily do requests to servers with
>> characters in the URL that are "invalid" per the RFC (they are not URL
>> escaped) and servers handle them fine I think we should make the
>> syntax more lenient. E.g. allowing [ and ] in the path and query
>> component is fine I think.
>
> I believe this would introduce ambiguity for parsing URI references.
> Is "[::1]" an authority reference or a path segment reference?

Path.

>> As for the question about why not build this on top of RFC 3986. That
>> does not handle non-ASCII code points. RFC 3987 does, but is not a
>> suitable start either. As shown in http://url.spec.whatwg.org/ it is
>> quite trivial to combine parsing, resolving, and canonicalizing into a
>> single algorithm (and deal with URI/IRI, now URL, as one).
>
> Composition is often trivial but unenlightening. There is necessarily
> less information in a partially evaluated function composition than in
> the functions in isolation.
>
> Defining a formal language accurately and in a broadly understandable
> manner is nontrivial. Your task is nontrivial.

I have no idea what you are talking about.

> What is the acceptable trade-off between (y)our hassle and the time of
> technologists in the coming decades? Will you make it easier or harder
> for them to reconcile WHATWG-URL and Internet Standard 66 (RFC 3986)?

I'm not sure why I should care about STD 66. It is inaccurate, does
not match implementations, and cannot be used to write new
implementations that want to be compatible with content and services
on the web. I am tackling those problems, and writing them down in a
way we have written standards for over eight years now, which thus far
has been successful.

(Obviously STD 66 is a document many people value, but these people
generally have not looked at the particulars or written software that
deals with Location headers whose values contain spaces, etc. assuming
they have a "correct" STD 66 implementation to begin with. If there is
a document that addresses URLs on the web better, they will use that
instead.)

-- 
http://annevankesteren.nl/

Re: [whatwg] New URL Standard

2012-09-25 Thread Ian Hickson

On Tue, 25 Sep 2012, David Sheets wrote:
> >
> > Not necessarily, but that's certainly possible. Personally I would 
> > recommend that we not change the definition of what is conforming from 
> > the current RFC3986/RFC3987 rules, except to the extent that the 
> > character encoding affects it (as per the HTML standard today).
> >
> >http://whatwg.org/html#valid-url
> 
> I believe the '#' character in the fragment identifier qualifies.

Not sure what you mean.

Sounds like Anne is indeed expecting to widen the range of valid URLs 
though, so please disregard my comments on the matter. :-)


> > We should definitely define what is a conforming URL, yes (either 
> > directly, or by reference to the RFCs, as HTML does now). Whether 
> > prose or a structured language is the better way to go depends on what 
> > the conformance rules are -- HTML is a good example here: it has parts 
> > that are defined in terms of prose (e.g. the HTML syntax as a whole), 
> > and other parts that are defined in terms of BNF (e.g. constraints on 
> > the conetnts of

Re: [whatwg] New URL Standard

2012-09-25 Thread David Sheets

On Tue, Sep 25, 2012 at 8:03 AM, Anne van Kesteren  wrote:
> On Tue, Sep 25, 2012 at 6:18 AM, Ian Hickson  wrote:
>> Not necessarily, but that's certainly possible. Personally I would
>> recommend that we not change the definition of what is conforming from the
>> current RFC3986/RFC3987 rules, except to the extent that the character
>> encoding affects it (as per the HTML standard today).
>>
>>http://whatwg.org/html#valid-url
>
> FWIW, given that browsers happily do requests to servers with
> characters in the URL that are "invalid" per the RFC (they are not URL
> escaped) and servers handle them fine I think we should make the
> syntax more lenient. E.g. allowing [ and ] in the path and query
> component is fine I think.

I believe this would introduce ambiguity for parsing URI references.
Is "[::1]" an authority reference or a path segment reference?

> As for the question about why not build this on top of RFC 3986. That
> does not handle non-ASCII code points. RFC 3987 does, but is not a
> suitable start either. As shown in http://url.spec.whatwg.org/ it is
> quite trivial to combine parsing, resolving, and canonicalizing into a
> single algorithm (and deal with URI/IRI, now URL, as one).

Composition is often trivial but unenlightening. There is necessarily
less information in a partially evaluated function composition than in
the functions in isolation.

Defining a formal language accurately and in a broadly understandable
manner is nontrivial. Your task is nontrivial.

> Trying to
> somehow patch the language in RFC 3987 to deal with the encoding
> problems for the query component, to deal with parsing
> http:example.org when there is a base URL with the same scheme versus
> when there isn't, etc. is way more of a hassle I think, though I am
> happy to be proven wrong.

I believe the encoding problems are handled by a normalization
algorithm and parsing relative references is handled by the base
scheme module.

What is the acceptable trade-off between (y)our hassle and the time of
technologists in the coming decades? Will you make it easier or harder
for them to reconcile WHATWG-URL and Internet Standard 66 (RFC 3986)?

> --
> http://annevankesteren.nl/

Re: [whatwg] New URL Standard

2012-09-25 Thread David Sheets

On Mon, Sep 24, 2012 at 9:18 PM, Ian Hickson  wrote:
>
> This is Anne's spec, so I'll let him give more canonical answers, but:
>
> On Mon, 24 Sep 2012, David Sheets wrote:
>>
>> Your conforming WHATWG-URL syntax will have production rule alphabets
>> which are supersets of the alphabets in RFC3986.
>
> Not necessarily, but that's certainly possible. Personally I would
> recommend that we not change the definition of what is conforming from the
> current RFC3986/RFC3987 rules, except to the extent that the character
> encoding affects it (as per the HTML standard today).
>
>http://whatwg.org/html#valid-url

I believe the '#' character in the fragment identifier qualifies.

>> This is what I propose you define and it does not necessarily have to be
>> in BNF (though a production rule language of some sort probably isn't a
>> bad idea).
>
> We should definitely define what is a conforming URL, yes (either
> directly, or by reference to the RFCs, as HTML does now). Whether prose or
> a structured language is the better way to go depends on what the
> conformance rules are -- HTML is a good example here: it has parts that
> are defined in terms of prose (e.g. the HTML syntax as a whole), and other
> parts that are defined in terms of BNF (e.g. constraints on the conetnts
> of

Re: [whatwg] New URL Standard

2012-09-25 Thread Anne van Kesteren

On Tue, Sep 25, 2012 at 6:18 AM, Ian Hickson  wrote:
> Not necessarily, but that's certainly possible. Personally I would
> recommend that we not change the definition of what is conforming from the
> current RFC3986/RFC3987 rules, except to the extent that the character
> encoding affects it (as per the HTML standard today).
>
>http://whatwg.org/html#valid-url

FWIW, given that browsers happily do requests to servers with
characters in the URL that are "invalid" per the RFC (they are not URL
escaped) and servers handle them fine I think we should make the
syntax more lenient. E.g. allowing [ and ] in the path and query
component is fine I think.

As for the question about why not build this on top of RFC 3986. That
does not handle non-ASCII code points. RFC 3987 does, but is not a
suitable start either. As shown in http://url.spec.whatwg.org/ it is
quite trivial to combine parsing, resolving, and canonicalizing into a
single algorithm (and deal with URI/IRI, now URL, as one). Trying to
somehow patch the language in RFC 3987 to deal with the encoding
problems for the query component, to deal with parsing
http:example.org when there is a base URL with the same scheme versus
when there isn't, etc. is way more of a hassle I think, though I am
happy to be proven wrong.

-- 
http://annevankesteren.nl/

Re: [whatwg] New URL Standard

2012-09-25 Thread Silvia Pfeiffer

On Tue, Sep 25, 2012 at 9:48 PM, Robin Berjon  wrote:
> On 25/09/2012 01:07 , Glenn Maynard wrote:
>>
>> On Mon, Sep 24, 2012 at 12:30 PM, Tab Atkins Jr.
>> wrote:
>>>
>>> I suggest just making it a map from String->[String].  You probably
>>> want a little bit of magic - if the setter receives an array, replace
>>> the current value with it; anything else, stringify then wrap in an
>>> array and replace the current value.  The getter should return an
>>> empty array for non-existing params.  You should be able to set .query
>>> itself with an object, which empties out the map and then runs the
>>> setter over all the items.  Bam, every single methods is now obsolete.
>>
>>
>> When should this API guarantee that it round-trips URLs cleanly (aside
>> from
>> quoting differences)?  For example, maintaining order in "a=1&b=2&a=1",
>> and
>> representing things like "a=1&b" (no '=') and "a&&b" (no key at all).
>
>
> And round-tripping using ; as the separator instead of &. I mention this
> because I've seen actual production code (more than once) that relied on
> this. I have no idea how common it is though. I'm guessing not too much, but
> probably some since it was in HTML 4.01:
>
> http://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2
>
> Of course another option is to just not parse that into key-value pairs in
> the first place.

I have also seen key-value pairs separated both by "&" and by ";", but
not in real life in quite some time. See also the discussion here:
[1]. For media fragment URIs we chose to only recommend use of "&" [2]
(see section 51. " "&" is the only primary separator for name-value
pairs, but some server-side languages also treat ";" as a separator.
").

Cheers,
Silvia.

[1] https://discussion.dreamhost.com/thread-134179.html
[2] http://www.w3.org/TR/media-frags/

Re: [whatwg] New URL Standard

2012-09-25 Thread Alexandre Morgaut


On 25 sept. 2012, at 13:48, Robin Berjon wrote:

> On 25/09/2012 01:07 , Glenn Maynard wrote:
> And round-tripping using ; as the separator instead of &. I mention this
> because I've seen actual production code (more than once) that relied on
> this. I have no idea how common it is though. I'm guessing not too much,
> but probably some since it was in HTML 4.01:
>
> http://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2
>
> Of course another option is to just not parse that into key-value pairs
> in the first place.

Technically ";" might also be interpreted as part of the value
I think considering it as separator would introduce more problems that the ones 
it could resolve (my 2 cents)

>
>> By the way, it would also be nice for the query part of this API to be
>> usable in isolation.
>
> +1

The query part should still be accessible via the search property





Alexandre Morgaut
Wakanda Community Manager

4D SAS
60, rue d'Alsace
92110 Clichy
France

Standard : +33 1 40 87 92 00
Email :alexandre.morg...@4d.com
Web :  www.4D.com

Re: [whatwg] New URL Standard

2012-09-25 Thread Robin Berjon


On 25/09/2012 01:07 , Glenn Maynard wrote:

On Mon, Sep 24, 2012 at 12:30 PM, Tab Atkins Jr. wrote:

I suggest just making it a map from String->[String].  You probably
want a little bit of magic - if the setter receives an array, replace
the current value with it; anything else, stringify then wrap in an
array and replace the current value.  The getter should return an
empty array for non-existing params.  You should be able to set .query
itself with an object, which empties out the map and then runs the
setter over all the items.  Bam, every single methods is now obsolete.


When should this API guarantee that it round-trips URLs cleanly (aside from
quoting differences)?  For example, maintaining order in "a=1&b=2&a=1", and
representing things like "a=1&b" (no '=') and "a&&b" (no key at all).


And round-tripping using ; as the separator instead of &. I mention this 
because I've seen actual production code (more than once) that relied on 
this. I have no idea how common it is though. I'm guessing not too much, 
but probably some since it was in HTML 4.01:


http://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2

Of course another option is to just not parse that into key-value pairs 
in the first place.



By the way, it would also be nice for the query part of this API to be
usable in isolation.


+1

--
Robin Berjon - http://berjon.com/ - @robinberjon

Re: [whatwg] New URL Standard

2012-09-24 Thread Ian Hickson

This is Anne's spec, so I'll let him give more canonical answers, but:

On Mon, 24 Sep 2012, David Sheets wrote:
> 
> Your conforming WHATWG-URL syntax will have production rule alphabets 
> which are supersets of the alphabets in RFC3986.

Not necessarily, but that's certainly possible. Personally I would 
recommend that we not change the definition of what is conforming from the 
current RFC3986/RFC3987 rules, except to the extent that the character 
encoding affects it (as per the HTML standard today).

   http://whatwg.org/html#valid-url

> This is what I propose you define and it does not necessarily have to be 
> in BNF (though a production rule language of some sort probably isn't a 
> bad idea).

We should definitely define what is a conforming URL, yes (either 
directly, or by reference to the RFCs, as HTML does now). Whether prose or 
a structured language is the better way to go depends on what the 
conformance rules are -- HTML is a good example here: it has parts that 
are defined in terms of prose (e.g. the HTML syntax as a whole), and other 
parts that are defined in terms of BNF (e.g. constraints on the conetnts 
of

Re: [whatwg] New URL Standard

2012-09-24 Thread David Sheets

On Mon, Sep 24, 2012 at 5:23 PM, Ian Hickson  wrote:
> On Mon, 24 Sep 2012, David Sheets wrote:
>>
>> Is there an issue with defining WHATWG-URL syntax as a grammar extension
>> to the URI syntax in RFC3986?
>
> In general, BNF isn't very useful for defining the parsing rules when you
> also need to handle non-conforming content in a correct manner. Really it
> is only useful for saying whether or not content is conforming.

Your conforming WHATWG-URL syntax will have production rule alphabets
which are supersets of the alphabets in RFC3986. This is what I
propose you define and it does not necessarily have to be in BNF
(though a production rule language of some sort probably isn't a bad
idea).

If you read my mail carefully, you will notice that I address the
non-conforming identifier case in the initial canonicalization
algorithm. This normalization step is separate from the syntax of
conforming WHATWG-URLs and would define how non-conforming strings are
interpreted as conforming strings. The parsing algorithm then provides
a map from these strings into a data structure.

Error recovery and extended syntax for conforming representations are
orthogonal.

How will WHATWG-URLs which use the syntax extended from RFC3986 map
into RFC3986 URI references for systems that only support those?

Re: [whatwg] New URL Standard

2012-09-24 Thread Ian Hickson

On Mon, 24 Sep 2012, David Sheets wrote:
> 
> Is there an issue with defining WHATWG-URL syntax as a grammar extension 
> to the URI syntax in RFC3986?

In general, BNF isn't very useful for defining the parsing rules when you 
also need to handle non-conforming content in a correct manner. Really it 
is only useful for saying whether or not content is conforming.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] New URL Standard

2012-09-24 Thread David Sheets

On Mon, Sep 24, 2012 at 4:07 PM, Glenn Maynard  wrote:
> On Mon, Sep 24, 2012 at 12:30 PM, Tab Atkins Jr. wrote:
>
>> I suggest just making it a map from String->[String].  You probably
>> want a little bit of magic - if the setter receives an array, replace
>> the current value with it; anything else, stringify then wrap in an
>> array and replace the current value.  The getter should return an
>> empty array for non-existing params.  You should be able to set .query
>> itself with an object, which empties out the map and then runs the
>> setter over all the items.  Bam, every single methods is now obsolete.
>>
>
> When should this API guarantee that it round-trips URLs cleanly (aside from
> quoting differences)?  For example, maintaining order in "a=1&b=2&a=1", and
> representing things like "a=1&b" (no '=') and "a&&b" (no key at all).

Always. The appropriate interface is (string * string?) list. Id est,
an association list of keys and nullable values (null is
key-without-value and empty string is empty-value). If you prefer to
not use a nullable value and don't like tuple representations in JS,
you could use type: string list list

i.e.

[["key_without_value"],[""],["key","value"],[],["numbers",1,2,3,4],["",""],["","",""]]

becomes

"?key_without_value&&key=value&&numbers=1,2,3,4&=&=,"

where I've assumed that values after the second are concatenated with
commas (but it could be semicolons or some other separator).

Unfortunately, JavaScript does not have any lightweight product types
so a decision like this is necessary.

> Not round-tripping URLs might have annoying side-effects, like trying to
> use history.replaceState to replace the path portion of the URL, and
> unexpectedly having the query part of the URL get shuffled around or
> changed in other ways.

That would be unacceptably broken.

> Maybe it could guarantee that the query round-trips only if the value is
> never modified (only assigned via the ctor or assigning to href), but once
> you modify the query, the order becomes normalized and any other
> non-round-trip side effects happen.

Why can't as much information as possible be preserved? There exist
many URI manipulation libraries that support maximal preservation.

> By the way, it would also be nice for the query part of this API to be
> usable in isolation.  I often put query-like strings in the hash, resulting
> in URLs like "
> http://example.com/server/side/path?server-side-query=1#client/side/path?client-side-query=1";,
> and it would be nice to be able to work with both of these with the same
> interface.  That is, query = new URLQuery("a=b&c=d"); query["a"] = "x";
> query.toString() == "a=x&c=d";

Is this not already supported by creating a new URL which contains
only a relative query part?

Like: query = new URL("?a=b&c=d"); query.query["a"] = "x";
query.toString() == "?a=x&c=d";

Why is a new interface necessary?

> --
> Glenn Maynard

Re: [whatwg] New URL Standard

2012-09-24 Thread Glenn Maynard

On Mon, Sep 24, 2012 at 12:30 PM, Tab Atkins Jr. wrote:

> I suggest just making it a map from String->[String].  You probably
> want a little bit of magic - if the setter receives an array, replace
> the current value with it; anything else, stringify then wrap in an
> array and replace the current value.  The getter should return an
> empty array for non-existing params.  You should be able to set .query
> itself with an object, which empties out the map and then runs the
> setter over all the items.  Bam, every single methods is now obsolete.
>

When should this API guarantee that it round-trips URLs cleanly (aside from
quoting differences)?  For example, maintaining order in "a=1&b=2&a=1", and
representing things like "a=1&b" (no '=') and "a&&b" (no key at all).

Not round-tripping URLs might have annoying side-effects, like trying to
use history.replaceState to replace the path portion of the URL, and
unexpectedly having the query part of the URL get shuffled around or
changed in other ways.

Maybe it could guarantee that the query round-trips only if the value is
never modified (only assigned via the ctor or assigning to href), but once
you modify the query, the order becomes normalized and any other
non-round-trip side effects happen.

By the way, it would also be nice for the query part of this API to be
usable in isolation.  I often put query-like strings in the hash, resulting
in URLs like "
http://example.com/server/side/path?server-side-query=1#client/side/path?client-side-query=1";,
and it would be nice to be able to work with both of these with the same
interface.  That is, query = new URLQuery("a=b&c=d"); query["a"] = "x";
query.toString() == "a=x&c=d";

-- 
Glenn Maynard

Re: [whatwg] New URL Standard

2012-09-24 Thread David Sheets

On Mon, Sep 24, 2012 at 2:34 AM, Anne van Kesteren  wrote:
> On Sat, Sep 22, 2012 at 9:10 AM, Alexandre Morgaut  
> wrote:
>> Shouldn't this document have references on some of the URL related RFCs:
>
> The plan is to obsolete the RFCs. But yes, I will add some references
> in the Goals section most likely. Similar to what has been done in the
> DOM Standard.

Is there an issue with defining WHATWG-URL syntax as a grammar
extension to the URI syntax in RFC3986?

How about splitting the definition of the parsing algorithm into a
canonicalization algorithm and a separate parser for the extended
syntax? The type would be string -> string with the codomain as a
valid, unique WHATWG-URL serialization. Implementations/IDL could
provide only the composition of canonicalization and parsing but
humans trying to understand the semantics of the present algorithm
would be aided by having these phases explicitly defined.

Will any means be provided to map WHATWG-URL to Internet Standard
RFC3986-URI? Is interoperability with the deployed base of URL
consumers a goal? How will those URLs in the extended syntax be mapped
into standard URIs? Will they be unrepresentable?

Thanks,

David Sheets

Re: [whatwg] New URL Standard

2012-09-24 Thread Tab Atkins Jr.

On Mon, Sep 24, 2012 at 2:34 AM, Anne van Kesteren  wrote:
> I have been thinking about introducing a .query attribute that would
> return a special interface for this purpose, but what the right API
> should be seems somewhat tricky. Adam and Erik came up with a solution
> that introduces eight new methods (see
> http://dvcs.w3.org/hg/url/raw-file/tip/Overview.html#url ) but I hope
> we can find something more elegant. (Unless we are stuck with their
> solution for some reason, but I believe that is not the case.)

Yeah, that interface is pretty unfriendly.

I suggest just making it a map from String->[String].  You probably
want a little bit of magic - if the setter receives an array, replace
the current value with it; anything else, stringify then wrap in an
array and replace the current value.  The getter should return an
empty array for non-existing params.  You should be able to set .query
itself with an object, which empties out the map and then runs the
setter over all the items.  Bam, every single methods is now obsolete.

~TJ

Re: [whatwg] New URL Standard

2012-09-24 Thread Boris Zbarsky


On 9/24/12 4:58 AM, Anne van Kesteren wrote:

Say you have ; the concern is what e.g.
a.protocol and a.pathname would return here. For invalid URLs they
would return ":" and "" respectively. If we treat this as a valid URL
you would get "data:" and "test". In Gecko I get "http:" and "". If I
make that  Gecko will give meaningful
answers (well pathname is still "", maybe that is okay and pathname
should only work for hierarchical URLs).


Ah, I see.

So what happens here is that Gecko treats this as an invalid URL (more 
precisely, it cannot create an internal "URI" object from this string). 
 I guess that's what you were getting at: that data: URLs actually have 
a concept of "invalid" in Gecko.  This is actually true for all schemes 
Gecko supports, in general.  For example, "http://something or other" 
(with the spaces) will do the same thing.


For an invalid URI, .protocol currently returns "http:" in Gecko.  I 
have no idea why, offhand.  It could just as easily return ":".


As far as .pathname, what Gecko does is exactly what you say: .pathname 
only works on hierarchical schemes.



More general, what I want is that for *any* given input in , xhr.open("GET", ...), new URL(...), etc. I want to be
able to tell what the various URL components are going to be. The kind
of predictability we have for the HTML parser, I want to have for the
URL parser as well.


Yes, absolutely agreed.


(If that means handling data URLs at the layer of the URL parser
rather than a separate parser that goes over the path, as Gecko
appears to be doing, so be it.)


We could change Gecko's handling here, for what it's worth.  One reason 
for the current handling is that right now we don't even make  into a 
link unless its href is a valid URI as far as Gecko is concerned.  But 
I'm considering changing that anyway, since no one else bothers with 
such niceties and they complicate implementation a bit...



If you want constructive advice, it would be interesting to get a full list
of all the weird stuff that UAs do here so we can evaluate which parts of it
are needed and why.  I can try to produce such a list for Gecko, if there
seems to be motion on the general idea.


I think that would be a great start. I'm happy to start out with
Gecko's behavior and iterate over time as feedback comes in from other
browsers.


Hmm.  So here goes at least a partial list:

1)  On Windows and OS/2, Gecko replaces '\\' with '/' in file:// URI 
strings before doing anything else with the string when parsing a new 
URL.  That includes relative URI strings being resolved against a 
file:// base.


2)  file:// URIs are parsed as a "no authority" URL in Gecko.  Quoting 
the IDL comment:


35 /**
36  * blah:foo/bar=> blah:///foo/bar
37  * blah:/foo/bar   => blah:///foo/bar
38  * blah://foo/bar  => blah://foo/bar
39  * blah:///foo/bar => blah:///foo/bar
40  */

where the thing on the left is the input string and the thing on the 
right is the normalized form that the parser produces from it.  Note 
that this is different from how HTTP URIs are parsed, for all except the 
item on line number 38 there.


3)  Gecko does not allow setting a username, password, hostname, port on 
an existing "no authority" URL object, including file://.  Attempts to 
do that throw internally; I believe for web stuff it just becomes a no-op.


4)  For "no authority" URLs, including file://, on Windows and OS/2 
only, if what looks like authority section looks like a drive letter, 
it's treated as part of the path.  For example, "file://c:/" is treated 
as the filename "c:\".  "Looks like a drive letter" is defined as "ASCII 
letter (any case), followed by a ':' or '|' and then followed by end of 
string or '/' or '\\'".  I'm not sure why this is checking for '\\' 
again, honestly.  ;)


5)  When parsing a "no authority" URL (including file://), and when item 
4 above does not apply, it looks like Gecko skips everything after 
"file://" up until the next '/', '?', or '#' char before parsing path stuff.


6)  On Windows and OS/2, when dynamically parsing a path for a "no 
authority" URL (not sure whether this is actually web-exposed, fwiw...) 
Gecko will do something involving looking for a path that's only an 
ASCII letter followed by ':' or '|' followed by end of string.  I'm not 
quite sure what that part is about...  It might have to do with the fact 
that URI objects in Gecko can have concepts of "directory", "filename", 
"extension" or something like that.


7)  When doing URI equality comparisons, if two file:// URIs only differ 
in their directory/filename/extension (so the actual file path), then an 
equality comparison is done on the underlying file path objects.  What 
this means depends on the OS.  On "Unix" this is just a straight-up byte 
by byte compare of file paths.  I think OS X now follows the "Unix" code 
path as do most other supported platforms.  But note that "file path" in 
this case is normalized in various ways.  Spe

Re: [whatwg] New URL Standard

2012-09-24 Thread Jukka K. Korpela


2012-09-24 15:26, Karl Dubost wrote:


Le 24 sept. 2012 à 12:08, Jukka K. Korpela a écrit :

It also means that the only immediately available source information for a 
quotation will be an ISBN in URL format. So, for example, working offline, you 
won't see even the title and the author. Would the quotation even satisfy the 
legal requirements for quotations?


unrelated and orthogonal.
We are not talking about bibliographical reference model, which would by useful 
by its own.


In the real world, this is about references. If you give authors a way 
to put references and citations into code, they will widely deploy the 
idea – they are even using title attributes for credits. Now, if 
specific support to cite attributes is added, this will be widely taken 
as *the* way of giving credits.


What other purpose would it serve? When the credits are properly written 
into content, using whatever system is regarded as appropriate, then the 
information in the credits can be turned into links, or dealt with in a 
link-like manner. It would be fairly safe to recognize ISBN strings 
(“ISBN” followed by numbers in a certain patter) from plain content, 
even in the absence of markup.


Yucca

Re: [whatwg] New URL Standard

2012-09-24 Thread Karl Dubost


Le 24 sept. 2012 à 12:08, Jukka K. Korpela a écrit :
> It also means that the only immediately available source information for a 
> quotation will be an ISBN in URL format. So, for example, working offline, 
> you won't see even the title and the author. Would the quotation even satisfy 
> the legal requirements for quotations?

unrelated and orthogonal.
We are not talking about bibliographical reference model, which would by useful 
by its own.

-- 
Karl Dubost - http://dev.opera.com/
Developer Relations, Opera Software

Re: [whatwg] New URL Standard

2012-09-24 Thread Alexandre Morgaut


On 24 sept. 2012, at 14:08, Alexandre Morgaut wrote:

>
> sms:+15105550101?body=hello%20there
>
> {
>host: "+15105550101",
>hostname: "+15105550101",
>href: "+15105550101?body=hello%20there",
>parameters: {
>body: "hello there"
>}
>pathname: "",
>port: "",
>protocol: "sms:",
>search: ""
> }

ooops

it should be

search: "?body=hello%20there"

of course




Alexandre Morgaut
Wakanda Community Manager

4D SAS
60, rue d'Alsace
92110 Clichy
France

Standard : +33 1 40 87 92 00
Email :alexandre.morg...@4d.com
Web :  www.4D.com

Re: [whatwg] New URL Standard

2012-09-24 Thread Alexandre Morgaut

On 24 sept. 2012, at 11:34, Anne van Kesteren wrote:

>> Could the search property have a key/value mapping?
>> ex:
>> http://test.com?param1=value1
>> -> var value1 = url.search.param1
>> "search" as "window.location" could still be usable as a string
>
> I have been thinking about introducing a .query attribute that would
> return a special interface for this purpose, but what the right API
> should be seems somewhat tricky. Adam and Erik came up with a solution
> that introduces eight new methods (see
> http://dvcs.w3.org/hg/url/raw-file/tip/Overview.html#url ) but I hope
> we can find something more elegant. (Unless we are stuck with their
> solution for some reason, but I believe that is not the case.)

Yes I saw the methods, and as for XHR and its headers, I don't find them user 
friendly enough
The "search" property could stand as is, but I personally think that having a 
Web Storage like key/value mapping for the parameters would make the code more 
readable.
We could then have a "params" or "parameters" property with key / value mapping 
and implementing the Storage interface:
http://www.w3.org/TR/webstorage/#storage-0
Developers who are more comfortable with methods would then still be happy, and 
because of having the same interface, the learning curve would be better.

What I would love in the enhancement of parameters management, is that the 
developer should not need to take care about URL encoding of the names and 
values any more all those encoding/decoding could be done automatically, 
either with your proposed methods or using a Storage interface...

>> Should this document include a more complete list of schemes with ones that 
>> are more and more used in URLs?
>
> Maybe, kinda depends on what turns out to be the ideal scope for the
> URL Standard. For now I only wanted to include those schemes relevant
> to the parser (and it may turn out there is a few more of those, e.g.
> mailto, javascript, data, and file might need some special casing).

Going progressively makes sense

>> Unfortunately, the URLUtil interface would not be adapted for them:
>> - the "protocol", "host", and "hostname" properties make sense and would 
>> work;
>> - the query part (search property) is used by the "mailto:"; and "sms:" URIs;
>> - for "tel:" and "fax", we see "parameters" prefixed by ";" as the ones used 
>> in some media types, those parameters could be found in the search property
>
> We might not want to adapt it either because of the relative increase
> in complexity while not actually addressing many use cases. You want
> to modify query/path for http/https and maybe ws/wss a lot, but not so
> much for mailto I'd think.

I started my purpose saying "Unfortunately...", but in the end, it looks like 
the Location/URL interface, in combination with the Storage interface should 
fit with any of the mentioned schemes. The only specificity being the format of 
the "tel:" parameters (it'd be great if we could update the RFC). I must say 
I'm more comfortable with the matching of this URL interface with "mailto:";, 
"tel:", "sms:", and "tv:" than with "data:" or "javascript:"

Bellow some potential examples for those schemes using the URL and the Storage 
interfaces (without showing the methods)

mailto:j...@example.com?cc=b...@example.com&subject=current-issue&body=send%20current-issue%0D%0Asend%20index

{
host: "j...@example.com",
hostname: "j...@example.com",
href: 
"j...@example.com?cc=b...@example.com&subject=current-issue&body=send%20current-issue%0D%0Asend%20index",
parameters: {
cc: "b...@example.com",
subject: "current-issue",
body: "send current-issue\r\nsend index"
}
pathname: "",
port: "",
protocol: "mailto:";,
search: "?cc=b...@example.com&body=hello",
}

tel:+11231231234;isub=8978

{
host: "+11231231234",
hostname: "+11231231234",
href: "+11231231234;isub=8978",
parameters: {
isub: "8978"
}
pathname: "",
port: "",
protocol: "tel:",
search: ""
}

sms:+15105550101?body=hello%20there

{
host: "+15105550101",
hostname: "+15105550101",
href: "+15105550101?body=hello%20there",
parameters: {
body: "hello there"
}
pathname: "",
port: "",
protocol: "sms:",
search: ""
}

tv:west.hbo.com

{
host: "west.hbo.com",
hostname: "west.hbo.com",
href: "west.hbo.com",
parameters: {}
pathname: "",
port: "",
protocol: "tv:",
search: ""
}

data:image/png;base64; 

{
host: "",
hostname: "",
href: "image/png;base64; ",
parameters: {} // might include auto-generated mediaType & charset 
string parameters and base64 boolean parameter
pathname: "",
port: "",
protocol: "data:",

Re: [whatwg] New URL Standard

2012-09-24 Thread Jukka K. Korpela


2012-09-24 12:47, Karl Dubost wrote:


On cite attributes, I'm using urn:isbn:


J'aime la liberté. J'aime être responsable
   de mes actes. J'aime comprendre ce que je
   fais… Et, cependant, je donne mon accord
   à ce marché bizarre.


Which I can use and parse with an extension in Opera [1] which convert it
> into a link to the Open Library. In the future I could give 
accessibilities

to different services, and the user could choose its own reference system.


This is all very cool in its own way, and could be useful when used
with discipline within a discipline. But for a long time, such cool 
ideas will not be supported in most browsing situations. Yet, authors 
who know the cool idea will apply it and will fail to "duplicate" any 
credits in the normal visible content. This means that to most users, a 
quotation will appear without any credits or source information.


It also means that the only immediately available source information for 
a quotation will be an ISBN in URL format. So, for example, working 
offline, you won't see even the title and the author. Would the 
quotation even satisfy the legal requirements for quotations?


If the credits are additionally given in visible content, there *there* 
is the place to do cool things with ISBNs. The credits, when they 
include the ISBN in addition to author, title, etc., could have the ISBN 
part turned to an element like ISBN 
2-7073-1038-7. (This would still suffer from lack of compatibility 
with older user agents, creating non-working links on them, so maybe 
some new markup - which would simply be ignored by old user agents - 
would be better.)


The point, however, is that the cite attribute in  is broken 
by design and should not be implemented in any new ways (or old).


Yucca

Re: [whatwg] New URL Standard

2012-09-24 Thread Karl Dubost


Le 21 sept. 2012 à 17:16, Anne van Kesteren a écrit :
> I took a crack at defining URLs: http://url.spec.whatwg.org/

Very cool.



On cite attributes, I'm using urn:isbn:


   J'aime la liberté. J'aime être responsable 
  de mes actes. J'aime comprendre ce que je 
  fais… Et, cependant, je donne mon accord 
  à ce marché bizarre.

   
Which I can use and parse with an extension in Opera [1] which convert it into 
a link to the Open Library. In the future I could give accessibilities to 
different services, and the user could choose its own reference system.

In this case.
http://openlibrary.org/books/OL8913264M/Djinn


All of that, it would be cool to be able to grab the relevant part of the URI 
without having to regex the string return by the cite attribute.

PS: and Yes I can live with not being there if you say no ;)

[1]: https://addons.opera.com/fr/extensions/details/quotelink/?display=en

-- 
Karl Dubost - http://dev.opera.com/
Developer Relations, Opera Software

Re: [whatwg] New URL Standard

2012-09-24 Thread Anne van Kesteren

On Sat, Sep 22, 2012 at 9:10 AM, Alexandre Morgaut
 wrote:
> Would the URLUtil interface replace the "URL decomposition IDL attributes" of 
> the Location interface?
> -> 
> http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#url-decomposition-idl-attributes
> -> 
> http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#the-location-interface

Yes. My plan is to obsolete most URL parts of HTML.

> Could the search property have a key/value mapping?
> ex:
> http://test.com?param1=value1
> -> var value1 = url.search.param1
> "search" as "window.location" could still be usable as a string

I have been thinking about introducing a .query attribute that would
return a special interface for this purpose, but what the right API
should be seems somewhat tricky. Adam and Erik came up with a solution
that introduces eight new methods (see
http://dvcs.w3.org/hg/url/raw-file/tip/Overview.html#url ) but I hope
we can find something more elegant. (Unless we are stuck with their
solution for some reason, but I believe that is not the case.)

> Shouldn't this document have references on some of the URL related RFCs:

The plan is to obsolete the RFCs. But yes, I will add some references
in the Goals section most likely. Similar to what has been done in the
DOM Standard.

> Should this document include a more complete list of schemes with ones that 
> are more and more used in URLs?

Maybe, kinda depends on what turns out to be the ideal scope for the
URL Standard. For now I only wanted to include those schemes relevant
to the parser (and it may turn out there is a few more of those, e.g.
mailto, javascript, data, and file might need some special casing).

> Unfortunately, the URLUtil interface would not be adapted for them:
> - the "protocol", "host", and "hostname" properties make sense and would work;
> - the query part (search property) is used by the "mailto:"; and "sms:" URIs;
> - for "tel:" and "fax", we see "parameters" prefixed by ";" as the ones used 
> in some media types, those parameters could be found in the search property

We might not want to adapt it either because of the relative increase
in complexity while not actually addressing many use cases. You want
to modify query/path for http/https and maybe ws/wss a lot, but not so
much for mailto I'd think.

-- 
http://annevankesteren.nl/

Re: [whatwg] New URL Standard

2012-09-24 Thread Tobie Langel

On Mon, Sep 24, 2012 at 10:58 AM, Anne van Kesteren  wrote:
> The kind of predictability we have for the HTML parser, I want to have for the
> URL parser as well.

Yes, please!!

--tobie

Re: [whatwg] New URL Standard

2012-09-24 Thread Anne van Kesteren

On Fri, Sep 21, 2012 at 5:36 PM, Boris Zbarsky  wrote:
> On 9/21/12 11:16 AM, Anne van Kesteren wrote:
>> * data URLs; in Gecko these appear to be parsed as part of the URL
>> layer, because they can turn a URL invalid. Other browsers do not do
>> this. Opinions? Should data URLs support .search?
>
> I'm not quite sure what you mean by "parsed as part of the URL layer" here.
> What's the concern?

Say you have ; the concern is what e.g.
a.protocol and a.pathname would return here. For invalid URLs they
would return ":" and "" respectively. If we treat this as a valid URL
you would get "data:" and "test". In Gecko I get "http:" and "". If I
make that  Gecko will give meaningful
answers (well pathname is still "", maybe that is okay and pathname
should only work for hierarchical URLs).

More general, what I want is that for *any* given input in , xhr.open("GET", ...), new URL(...), etc. I want to be
able to tell what the various URL components are going to be. The kind
of predictability we have for the HTML parser, I want to have for the
URL parser as well.

(If that means handling data URLs at the layer of the URL parser
rather than a separate parser that goes over the path, as Gecko
appears to be doing, so be it.)

>> * Advice on file URLs would be nice.
>
> Abandon Hope All Ye Who Enter Here?  ;)
>
> If you want constructive advice, it would be interesting to get a full list
> of all the weird stuff that UAs do here so we can evaluate which parts of it
> are needed and why.  I can try to produce such a list for Gecko, if there
> seems to be motion on the general idea.

I think that would be a great start. I'm happy to start out with
Gecko's behavior and iterate over time as feedback comes in from other
browsers.

-- 
http://annevankesteren.nl/

Re: [whatwg] New URL Standard

2012-09-23 Thread Maciej Stachowiak


Excellent work.

Did you use tests while making this and if so did you save them? It might be 
worthwhile to check all the browsers against the spec.

Cheers,
Maciej

On Sep 21, 2012, at 8:16 AM, Anne van Kesteren  wrote:

> I took a crack at defining URLs: http://url.spec.whatwg.org/
> 
> At the moment it defines parsing (minus domain names / IP addresses)
> and the JavaScript API (minus the query manipulation methods proposed
> by Adam Barth). It defines things like setting .pathname to "hello
> world" (notice the space), it defines what happens if you resolve
> "http:test" against a data URL (you get "http://test/";) or
> http://teehee (you get "http://teehee/test";). It is based on the
> various URL code paths found in WebKit and Gecko and supports the \ as
> / in various places because it seemed better for compatibility.
> 
> I'm looking for some feedback/ideas on how to handle various aspects, e.g.:
> 
> * data URLs; in Gecko these appear to be parsed as part of the URL
> layer, because they can turn a URL invalid. Other browsers do not do
> this. Opinions? Should data URLs support .search?
> * In the current text only a select few URLs support host/port/query.
> The rest is solely path/fragment. But maybe we want mailto to support
> query? Should it support host? (mailto supporting e.g. host would also
> mean normalising host via IDNA toASCII and friends. Not sure I'm fond
> of that.)
> * Advice on file URLs would be nice.
> * IDNA: what are your plans? IDNA2003 / IDNA2008 / UTS #46 / something
> else? It would be nice to get agreement on this.
> * Terminology: should we align the terminology with the API or would
> that just be too confusing?
> 
> Thanks!
> 
> 
> PS: It also does the query encoding thing correctly for the first time
> ever in the history of URL standards although the wording can probably
> be improved.
> 
> 
> -- 
> http://annevankesteren.nl/

Re: [whatwg] New URL Standard

2012-09-22 Thread Alexandre Morgaut


Thanks Anne, I'd appreciate to be able to easily get a URLUtil interface from a 
string UTL without doing some nasty hacks

I have a ew questions

Would the URLUtil interface replace the "URL decomposition IDL attributes" of 
the Location interface?
-> 
http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#url-decomposition-idl-attributes
-> 
http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#the-location-interface

Could the search property have a "query" and/or "params" (see "tel:" and "fax:" 
bellow) alias?

Could the search property have a key/value mapping?
ex:
http://test.com?param1=value1
-> var value1 = url.search.param1
"search" as "window.location" could still be usable as a string


Shouldn't this document have references on some of the URL related RFCs:

- Uniform Resource Locators (URL)
-> http://tools.ietf.org/html/rfc1738

- The "data" URL scheme
-> http://tools.ietf.org/html/rfc2397

- Uniform Resource Identifier (URI): Generic Syntax
-> http://tools.ietf.org/html/rfc3986

Should this document include a more complete list of schemes with ones that are 
more and more used in URLs?
ex:
- "mailto:";
-> https://tools.ietf.org/html/rfc2368
-> https://tools.ietf.org/html/rfc6068
- "tel:", "fax:"
-> https://tools.ietf.org/html/rfc2806
-> https://tools.ietf.org/html/rfc3966
- "sms:"
-> http://tools.ietf.org/html/rfc5724
- tv:
-> http://tools.ietf.org/html/rfc2838
Unfortunately, the URLUtil interface would not be adapted for them:
- the "protocol", "host", and "hostname" properties make sense and would work;
- the query part (search property) is used by the "mailto:"; and "sms:" URIs;
- for "tel:" and "fax", we see "parameters" prefixed by ";" as the ones used in 
some media types, those parameters could be found in the search property


PS:
Note that the fax: scheme could be supported in a form or via XHR to send PDF 
documents, postcript document, HTML documents with their potential CSS print...
But that would be another discussion

On 21 sept. 2012, at 17:16, Anne van Kesteren wrote:

> I took a crack at defining URLs: http://url.spec.whatwg.org/
>
> At the moment it defines parsing (minus domain names / IP addresses)
> and the JavaScript API (minus the query manipulation methods proposed
> by Adam Barth). It defines things like setting .pathname to "hello
> world" (notice the space), it defines what happens if you resolve
> "http:test" against a data URL (you get "http://test/";) or
> http://teehee (you get "http://teehee/test";). It is based on the
> various URL code paths found in WebKit and Gecko and supports the \ as
> / in various places because it seemed better for compatibility.
>
> I'm looking for some feedback/ideas on how to handle various aspects, e.g.:
>
> * data URLs; in Gecko these appear to be parsed as part of the URL
> layer, because they can turn a URL invalid. Other browsers do not do
> this. Opinions? Should data URLs support .search?
> * In the current text only a select few URLs support host/port/query.
> The rest is solely path/fragment. But maybe we want mailto to support
> query? Should it support host? (mailto supporting e.g. host would also
> mean normalising host via IDNA toASCII and friends. Not sure I'm fond
> of that.)
> * Advice on file URLs would be nice.
> * IDNA: what are your plans? IDNA2003 / IDNA2008 / UTS #46 / something
> else? It would be nice to get agreement on this.
> * Terminology: should we align the terminology with the API or would
> that just be too confusing?
>
> Thanks!
>
>
> PS: It also does the query encoding thing correctly for the first time
> ever in the history of URL standards although the wording can probably
> be improved.
>
>
> --
> http://annevankesteren.nl/





Alexandre Morgaut
Wakanda Community Manager

4D SAS
60, rue d'Alsace
92110 Clichy
France

Standard : +33 1 40 87 92 00
Email :alexandre.morg...@4d.com
Web :  www.4D.com

Re: [whatwg] New URL Standard

2012-09-21 Thread Julian Reschke

On 2012-09-21 17:16, Anne van Kesteren wrote:

I took a crack at defining URLs: http://url.spec.whatwg.org/

At the moment it defines parsing (minus domain names / IP addresses)
and the JavaScript API (minus the query manipulation methods proposed
by Adam Barth). It defines things like setting .pathname to "hello
world" (notice the space), it defines what happens if you resolve
"http:test" against a data URL (you get "http://test/";) or

As per RFC 3986, Section 5.2 ("Relative Resolution"), the answer IMHO is 
"http:test".

Fetching from that URI indeed used http://test/ (just checked in 
Mozilla), so it appears we have a terminology problem. It would be good 
if we could avoid confusing "relative reference resolution" with what 
you try to define here.

Note that the term "resolve" is widely used for what RFC 3986 Section 
5.2 defines; see, for instance, 
.

> ...

http://teehee (you get "http://teehee/test";). It is based on the
various URL code paths found in WebKit and Gecko and supports the \ as
/ in various places because it seemed better for compatibility.

I'm looking for some feedback/ideas on how to handle various aspects, e.g.:

* data URLs; in Gecko these appear to be parsed as part of the URL
layer, because they can turn a URL invalid. Other browsers do not do
this. Opinions? Should data URLs support .search?

> ...

I believe the behavior should be predictable and consistent no matter 
what the URI scheme is.

Best regards, Julian

PS: and no, I don't think "URL Standard" is a good name for this document.

Re: [whatwg] New URL Standard

2012-09-21 Thread Boris Zbarsky


On 9/21/12 11:16 AM, Anne van Kesteren wrote:

It is based on the
various URL code paths found in WebKit and Gecko and supports the \ as
/ in various places because it seemed better for compatibility.


Or worse, depending on your use cases...


* data URLs; in Gecko these appear to be parsed as part of the URL
layer, because they can turn a URL invalid. Other browsers do not do
this. Opinions? Should data URLs support .search?


I'm not quite sure what you mean by "parsed as part of the URL layer" 
here.  What's the concern?



* Advice on file URLs would be nice.


Abandon Hope All Ye Who Enter Here?  ;)

If you want constructive advice, it would be interesting to get a full 
list of all the weird stuff that UAs do here so we can evaluate which 
parts of it are needed and why.  I can try to produce such a list for 
Gecko, if there seems to be motion on the general idea.



PS: It also does the query encoding thing correctly for the first time
ever in the history of URL standards


\o/

-Boris

[whatwg] New URL Standard

2012-09-21 Thread Anne van Kesteren

I took a crack at defining URLs: http://url.spec.whatwg.org/

At the moment it defines parsing (minus domain names / IP addresses)
and the JavaScript API (minus the query manipulation methods proposed
by Adam Barth). It defines things like setting .pathname to "hello
world" (notice the space), it defines what happens if you resolve
"http:test" against a data URL (you get "http://test/";) or
http://teehee (you get "http://teehee/test";). It is based on the
various URL code paths found in WebKit and Gecko and supports the \ as
/ in various places because it seemed better for compatibility.

I'm looking for some feedback/ideas on how to handle various aspects, e.g.:

* data URLs; in Gecko these appear to be parsed as part of the URL
layer, because they can turn a URL invalid. Other browsers do not do
this. Opinions? Should data URLs support .search?
* In the current text only a select few URLs support host/port/query.
The rest is solely path/fragment. But maybe we want mailto to support
query? Should it support host? (mailto supporting e.g. host would also
mean normalising host via IDNA toASCII and friends. Not sure I'm fond
of that.)
* Advice on file URLs would be nice.
* IDNA: what are your plans? IDNA2003 / IDNA2008 / UTS #46 / something
else? It would be nice to get agreement on this.
* Terminology: should we align the terminology with the API or would
that just be too confusing?

Thanks!


PS: It also does the query encoding thing correctly for the first time
ever in the history of URL standards although the wording can probably
be improved.


-- 
http://annevankesteren.nl/

43 matches

Mail list logo