subject:"Re\: \[whatwg\] New URL Standard"

Re: [whatwg] New URL Standard

2012-10-06 Thread Glenn Maynard

On Tue, Sep 25, 2012 at 10:25 PM, Ian Hickson i...@hixie.ch wrote:

  You could even make that work, by having a special method for appending a
 new key/value pair, and just not making it accessible.


Right, other access methods, like this or a classList-like array, can
always be added later.  (Actually, key/value pairs appended like this would
still be accessible with Tab's suggestion, it's just the resulting key
order that it doesn't expose.)

-- 
Glenn Maynard

Re: [whatwg] New URL Standard

2012-09-25 Thread Robin Berjon


On 25/09/2012 01:07 , Glenn Maynard wrote:

On Mon, Sep 24, 2012 at 12:30 PM, Tab Atkins Jr. jackalm...@gmail.comwrote:

I suggest just making it a map from String-[String].  You probably
want a little bit of magic - if the setter receives an array, replace
the current value with it; anything else, stringify then wrap in an
array and replace the current value.  The getter should return an
empty array for non-existing params.  You should be able to set .query
itself with an object, which empties out the map and then runs the
setter over all the items.  Bam, every single methods is now obsolete.


When should this API guarantee that it round-trips URLs cleanly (aside from
quoting differences)?  For example, maintaining order in a=1b=2a=1, and
representing things like a=1b (no '=') and ab (no key at all).


And round-tripping using ; as the separator instead of . I mention this 
because I've seen actual production code (more than once) that relied on 
this. I have no idea how common it is though. I'm guessing not too much, 
but probably some since it was in HTML 4.01:


http://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2

Of course another option is to just not parse that into key-value pairs 
in the first place.



By the way, it would also be nice for the query part of this API to be
usable in isolation.


+1

--
Robin Berjon - http://berjon.com/ - @robinberjon

Re: [whatwg] New URL Standard

2012-09-25 Thread Alexandre Morgaut


On 25 sept. 2012, at 13:48, Robin Berjon wrote:

 On 25/09/2012 01:07 , Glenn Maynard wrote:
 And round-tripping using ; as the separator instead of . I mention this
 because I've seen actual production code (more than once) that relied on
 this. I have no idea how common it is though. I'm guessing not too much,
 but probably some since it was in HTML 4.01:

 http://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2

 Of course another option is to just not parse that into key-value pairs
 in the first place.

Technically ; might also be interpreted as part of the value
I think considering it as separator would introduce more problems that the ones 
it could resolve (my 2 cents)


 By the way, it would also be nice for the query part of this API to be
 usable in isolation.

 +1

The query part should still be accessible via the search property





Alexandre Morgaut
Wakanda Community Manager

4D SAS
60, rue d'Alsace
92110 Clichy
France

Standard : +33 1 40 87 92 00
Email :alexandre.morg...@4d.com
Web :  www.4D.com

Re: [whatwg] New URL Standard

2012-09-25 Thread Silvia Pfeiffer

On Tue, Sep 25, 2012 at 9:48 PM, Robin Berjon ro...@w3.org wrote:
 On 25/09/2012 01:07 , Glenn Maynard wrote:

 On Mon, Sep 24, 2012 at 12:30 PM, Tab Atkins Jr.
 jackalm...@gmail.comwrote:

 I suggest just making it a map from String-[String].  You probably
 want a little bit of magic - if the setter receives an array, replace
 the current value with it; anything else, stringify then wrap in an
 array and replace the current value.  The getter should return an
 empty array for non-existing params.  You should be able to set .query
 itself with an object, which empties out the map and then runs the
 setter over all the items.  Bam, every single methods is now obsolete.


 When should this API guarantee that it round-trips URLs cleanly (aside
 from
 quoting differences)?  For example, maintaining order in a=1b=2a=1,
 and
 representing things like a=1b (no '=') and ab (no key at all).


 And round-tripping using ; as the separator instead of . I mention this
 because I've seen actual production code (more than once) that relied on
 this. I have no idea how common it is though. I'm guessing not too much, but
 probably some since it was in HTML 4.01:

 http://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2

 Of course another option is to just not parse that into key-value pairs in
 the first place.

I have also seen key-value pairs separated both by  and by ;, but
not in real life in quite some time. See also the discussion here:
[1]. For media fragment URIs we chose to only recommend use of  [2]
(see section 51.   is the only primary separator for name-value
pairs, but some server-side languages also treat ; as a separator.
).

Cheers,
Silvia.

[1] https://discussion.dreamhost.com/thread-134179.html
[2] http://www.w3.org/TR/media-frags/

Re: [whatwg] New URL Standard

2012-09-25 Thread Anne van Kesteren

On Tue, Sep 25, 2012 at 6:18 AM, Ian Hickson i...@hixie.ch wrote:
 Not necessarily, but that's certainly possible. Personally I would
 recommend that we not change the definition of what is conforming from the
 current RFC3986/RFC3987 rules, except to the extent that the character
 encoding affects it (as per the HTML standard today).

http://whatwg.org/html#valid-url

FWIW, given that browsers happily do requests to servers with
characters in the URL that are invalid per the RFC (they are not URL
escaped) and servers handle them fine I think we should make the
syntax more lenient. E.g. allowing [ and ] in the path and query
component is fine I think.


As for the question about why not build this on top of RFC 3986. That
does not handle non-ASCII code points. RFC 3987 does, but is not a
suitable start either. As shown in http://url.spec.whatwg.org/ it is
quite trivial to combine parsing, resolving, and canonicalizing into a
single algorithm (and deal with URI/IRI, now URL, as one). Trying to
somehow patch the language in RFC 3987 to deal with the encoding
problems for the query component, to deal with parsing
http:example.org when there is a base URL with the same scheme versus
when there isn't, etc. is way more of a hassle I think, though I am
happy to be proven wrong.


-- 
http://annevankesteren.nl/

Re: [whatwg] New URL Standard

2012-09-25 Thread David Sheets

On Mon, Sep 24, 2012 at 9:18 PM, Ian Hickson i...@hixie.ch wrote:

 This is Anne's spec, so I'll let him give more canonical answers, but:

 On Mon, 24 Sep 2012, David Sheets wrote:

 Your conforming WHATWG-URL syntax will have production rule alphabets
 which are supersets of the alphabets in RFC3986.

 Not necessarily, but that's certainly possible. Personally I would
 recommend that we not change the definition of what is conforming from the
 current RFC3986/RFC3987 rules, except to the extent that the character
 encoding affects it (as per the HTML standard today).

http://whatwg.org/html#valid-url

I believe the '#' character in the fragment identifier qualifies.

 This is what I propose you define and it does not necessarily have to be
 in BNF (though a production rule language of some sort probably isn't a
 bad idea).

 We should definitely define what is a conforming URL, yes (either
 directly, or by reference to the RFCs, as HTML does now). Whether prose or
 a structured language is the better way to go depends on what the
 conformance rules are -- HTML is a good example here: it has parts that
 are defined in terms of prose (e.g. the HTML syntax as a whole), and other
 parts that are defined in terms of BNF (e.g. constraints on the conetnts
 of script elements in certain situations). It's up to Anne.

HTML is far larger and more compositional than URI. I am confident
that, no matter what is specified in the WHATWG New URL Standard, a
formal language exists which can describe the structure of conforming
identifiers. If no such formal language can be described, the syntax
specification is likely to be incomplete or unsound.

 How will WHATWG-URLs which use the syntax extended from RFC3986 map into
 RFC3986 URI references for systems that only support those?

 The same way that those systems handle invalid URLs today, I would assume.
 Do you have any concrete systems in mind here? It would be good to add
 them to the list of systems that we test. (For what it's worth, in
 practice, I've never found software that exactly followed RFC3986 and
 also rejected any non-conforming strings. There are just too many invalid
 URLs out there for that to be a viable implementation strategy.)

It is not the rejection of incoming nonconforming reference
identifiers that causes issues but rather the emission of strictly
conforming identifiers by Postel's Law (Robustness Principle). I know
of several URI implementations that, given a nonconforming reference
identifier, will only output conforming identifiers. Indeed, the
standard under discussion will behave in exactly this way.

This leads to loss of information in chains of URI processors that can
and will change the meaning of identifiers.

 I remember when I was testing this years ago, when doing the first pass on
 attempting to fix this, that I found that some less widely tested
 software, e.g. wget(1), did not handle URLs in the same manner as more
 widely tested software, e.g. IE, with the result being that Web pages were
 not handled interoperably between these two software classes. This is the
 kind of thing we want to stop, by providing a single way to parse all
 input strings, valid or invalid, as URLs.

Was wget in violation of the RFC? Was IE more lenient?

If every string, valid or invalid, is parseable as a URI reference, is
there an algorithm to accurately extract URIs from plain text?

 --
 Ian Hickson   U+1047E)\._.,--,'``.fL
 http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
 Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] New URL Standard

2012-09-25 Thread David Sheets

On Tue, Sep 25, 2012 at 8:03 AM, Anne van Kesteren ann...@annevk.nl wrote:
 On Tue, Sep 25, 2012 at 6:18 AM, Ian Hickson i...@hixie.ch wrote:
 Not necessarily, but that's certainly possible. Personally I would
 recommend that we not change the definition of what is conforming from the
 current RFC3986/RFC3987 rules, except to the extent that the character
 encoding affects it (as per the HTML standard today).

http://whatwg.org/html#valid-url

 FWIW, given that browsers happily do requests to servers with
 characters in the URL that are invalid per the RFC (they are not URL
 escaped) and servers handle them fine I think we should make the
 syntax more lenient. E.g. allowing [ and ] in the path and query
 component is fine I think.

I believe this would introduce ambiguity for parsing URI references.
Is [::1] an authority reference or a path segment reference?

 As for the question about why not build this on top of RFC 3986. That
 does not handle non-ASCII code points. RFC 3987 does, but is not a
 suitable start either. As shown in http://url.spec.whatwg.org/ it is
 quite trivial to combine parsing, resolving, and canonicalizing into a
 single algorithm (and deal with URI/IRI, now URL, as one).

Composition is often trivial but unenlightening. There is necessarily
less information in a partially evaluated function composition than in
the functions in isolation.

Defining a formal language accurately and in a broadly understandable
manner is nontrivial. Your task is nontrivial.

 Trying to
 somehow patch the language in RFC 3987 to deal with the encoding
 problems for the query component, to deal with parsing
 http:example.org when there is a base URL with the same scheme versus
 when there isn't, etc. is way more of a hassle I think, though I am
 happy to be proven wrong.

I believe the encoding problems are handled by a normalization
algorithm and parsing relative references is handled by the base
scheme module.

What is the acceptable trade-off between (y)our hassle and the time of
technologists in the coming decades? Will you make it easier or harder
for them to reconcile WHATWG-URL and Internet Standard 66 (RFC 3986)?

 --
 http://annevankesteren.nl/

Re: [whatwg] New URL Standard

2012-09-25 Thread Ian Hickson

On Tue, 25 Sep 2012, David Sheets wrote:
 
  Not necessarily, but that's certainly possible. Personally I would 
  recommend that we not change the definition of what is conforming from 
  the current RFC3986/RFC3987 rules, except to the extent that the 
  character encoding affects it (as per the HTML standard today).
 
 http://whatwg.org/html#valid-url
 
 I believe the '#' character in the fragment identifier qualifies.

Not sure what you mean.

Sounds like Anne is indeed expecting to widen the range of valid URLs 
though, so please disregard my comments on the matter. :-)


  We should definitely define what is a conforming URL, yes (either 
  directly, or by reference to the RFCs, as HTML does now). Whether 
  prose or a structured language is the better way to go depends on what 
  the conformance rules are -- HTML is a good example here: it has parts 
  that are defined in terms of prose (e.g. the HTML syntax as a whole), 
  and other parts that are defined in terms of BNF (e.g. constraints on 
  the conetnts of script elements in certain situations).
 
 HTML is far larger and more compositional than URI. I am confident that, 
 no matter what is specified in the WHATWG New URL Standard, a formal 
 language exists which can describe the structure of conforming 
 identifiers. If no such formal language can be described, the syntax 
 specification is likely to be incomplete or unsound.

Just because it's possible to use a formal language doesn't mean it's a 
good idea. It depends how clear it is. In the HTML spec, there are places 
where I've actually used a hybrid, using BNF with some terminals defined 
using prose because defining them in BNF, while possible, is confusing.


  How will WHATWG-URLs which use the syntax extended from RFC3986 map 
  into RFC3986 URI references for systems that only support those?
 
  The same way that those systems handle invalid URLs today, I would 
  assume. Do you have any concrete systems in mind here? It would be 
  good to add them to the list of systems that we test. (For what it's 
  worth, in practice, I've never found software that exactly followed 
  RFC3986 and also rejected any non-conforming strings. There are just 
  too many invalid URLs out there for that to be a viable implementation 
  strategy.)
 
 It is not the rejection of incoming nonconforming reference identifiers 
 that causes issues but rather the emission of strictly conforming 
 identifiers by Postel's Law (Robustness Principle). I know of several 
 URI implementations that, given a nonconforming reference identifier, 
 will only output conforming identifiers. Indeed, the standard under 
 discussion will behave in exactly this way.
 
 This leads to loss of information in chains of URI processors that can 
 and will change the meaning of identifiers.

I don't really follow. If you have any concrete examples that would really 
help.


  I remember when I was testing this years ago, when doing the first 
  pass on attempting to fix this, that I found that some less widely 
  tested software, e.g. wget(1), did not handle URLs in the same manner 
  as more widely tested software, e.g. IE, with the result being that 
  Web pages were not handled interoperably between these two software 
  classes. This is the kind of thing we want to stop, by providing a 
  single way to parse all input strings, valid or invalid, as URLs.
 
 Was wget in violation of the RFC? Was IE more lenient?

The RFC is so vague about what to do with non-conforming content that it's 
really hard to which was in violation or more lenient.

But in any case that's the wrong way to look at it. There's legacy 
content, there's implementations, and there's the spec. The spec is (or 
should be) the most mutable of these; its goal should be to define how 
implementations should behave in order to make the content work 
interoperably amongst all of the implementations, and to define the best 
practice for content creators to avoid known dangers.


 If every string, valid or invalid, is parseable as a URI reference, is 
 there an algorithm to accurately extract URIs from plain text?

That would be an interesting thing to define, but in practice I don't 
think it's something implementors would care to follow. People tend to 
write URL fragments and expect them to be linked. For example, if I write, 
in an e-mail, the string google.com, people expect google.com to become 
a link to http://google.com/; and for the comma to be ignored. Similarly, 
if I have a page on an intranet server and I write intranet/ianh/plan.txt, 
it would be useful if that was turned into a link to the file. But there's 
nothing to distinguish that from me writing freezing/ice/273.23K, which 
isn't intended to be a URL at all.

Given this, I think plain text renderers will be stuck with heuristics for 
some time to come. (Maybe even heuristics that involve actual DNS queries 
and HEAD requests to see if potential URLs are useful.)

-- 
Ian Hickson

Re: [whatwg] New URL Standard

2012-09-25 Thread Anne van Kesteren

On Tue, Sep 25, 2012 at 8:20 PM, David Sheets kosmo...@gmail.com wrote:
 On Tue, Sep 25, 2012 at 8:03 AM, Anne van Kesteren ann...@annevk.nl wrote:
 FWIW, given that browsers happily do requests to servers with
 characters in the URL that are invalid per the RFC (they are not URL
 escaped) and servers handle them fine I think we should make the
 syntax more lenient. E.g. allowing [ and ] in the path and query
 component is fine I think.

 I believe this would introduce ambiguity for parsing URI references.
 Is [::1] an authority reference or a path segment reference?

Path.


 As for the question about why not build this on top of RFC 3986. That
 does not handle non-ASCII code points. RFC 3987 does, but is not a
 suitable start either. As shown in http://url.spec.whatwg.org/ it is
 quite trivial to combine parsing, resolving, and canonicalizing into a
 single algorithm (and deal with URI/IRI, now URL, as one).

 Composition is often trivial but unenlightening. There is necessarily
 less information in a partially evaluated function composition than in
 the functions in isolation.

 Defining a formal language accurately and in a broadly understandable
 manner is nontrivial. Your task is nontrivial.

I have no idea what you are talking about.


 What is the acceptable trade-off between (y)our hassle and the time of
 technologists in the coming decades? Will you make it easier or harder
 for them to reconcile WHATWG-URL and Internet Standard 66 (RFC 3986)?

I'm not sure why I should care about STD 66. It is inaccurate, does
not match implementations, and cannot be used to write new
implementations that want to be compatible with content and services
on the web. I am tackling those problems, and writing them down in a
way we have written standards for over eight years now, which thus far
has been successful.

(Obviously STD 66 is a document many people value, but these people
generally have not looked at the particulars or written software that
deals with Location headers whose values contain spaces, etc. assuming
they have a correct STD 66 implementation to begin with. If there is
a document that addresses URLs on the web better, they will use that
instead.)


-- 
http://annevankesteren.nl/

Re: [whatwg] New URL Standard

2012-09-25 Thread Glenn Maynard

On Mon, Sep 24, 2012 at 7:18 PM, David Sheets kosmo...@gmail.com wrote:

  Always. The appropriate interface is (string * string?) list. Id est,

an association list of keys and nullable values (null is
 key-without-value and empty string is empty-value). If you prefer to
 not use a nullable value and don't like tuple representations in JS,
 you could use type: string list list

 i.e.


 [[key_without_value],[],[key,value],[],[numbers,1,2,3,4],[,],[,,]]


This isn't an appropriate interface.  It's terrible for 99.9% of use cases,
where you really want dictionary-like access.

The right approach is probably to expose the results in an object-like
form, as Tab suggests, but to store the state internally in a list-like
format, with modifications defined in terms of mutations to the list.

That is, parsing a=1b=2a=3 would result in an internal representation
like [('a', '1'), ('b', '2'), ('a', '3')].  When viewed from script, you
see {a: ['1', '3'], 'b': ['2']}.  If you serialize it right back to a URL
the internal representation is unchanged, so the original order is
preserved.  The mutation algorithms can then do their best to preserve the
list as reasonably as they can (eg. assigning query.a = ['5', '6'] would
remove all 'a' keys, then insert items at the location of the first removed
item, or append if there were none).

 Is this not already supported by creating a new URL which contains
 only a relative query part?

 Like: query = new URL(?a=bc=d); query.query[a] = x;
 query.toString() == ?a=xc=d;

 Why is a new interface necessary?


That won't work, since ?a=bc=d isn't a valid URL.  The invalid flag will
be set, so the change to .query will be a no-op, and .href (presumably what
toString will invoke) would return the original URL, ?a=bc=d, not
?a=xc=d.  You'd need to do something like:

var query = new URL(http://example.com?; + url.hash);
query.query.a = x;
url.hash = query.search.slice(1); // remove the leading ?

That's awkward, but maybe it's good enough.

-- 
Glenn Maynard

Re: [whatwg] New URL Standard

2012-09-25 Thread David Sheets

On Tue, Sep 25, 2012 at 2:13 PM, Glenn Maynard gl...@zewt.org wrote:
 On Mon, Sep 24, 2012 at 7:18 PM, David Sheets kosmo...@gmail.com wrote:

 Always. The appropriate interface is (string * string?) list. Id est,

 an association list of keys and nullable values (null is
 key-without-value and empty string is empty-value). If you prefer to
 not use a nullable value and don't like tuple representations in JS,
 you could use type: string list list

 i.e.


 [[key_without_value],[],[key,value],[],[numbers,1,2,3,4],[,],[,,]]


 This isn't an appropriate interface.  It's terrible for 99.9% of use cases,
 where you really want dictionary-like access.

This is the direct representation of the query string key-value convention.

Looking up keys is easy in an association list. Filtering the list
retains ordering. Appending to the list is well-defined. Folding into
a dictionary is trivial and key merging can be defined according to
the author's URL convention.

 The right approach is probably to expose the results in an object-like form,
 as Tab suggests, but to store the state internally in a list-like format,
 with modifications defined in terms of mutations to the list.

This sounds more complicated to implement while maintaining
invariants. A dictionary with an associated total order is an
association list.

 That is, parsing a=1b=2a=3 would result in an internal representation
 like [('a', '1'), ('b', '2'), ('a', '3')].  When viewed from script, you see
 {a: ['1', '3'], 'b': ['2']}.  If you serialize it right back to a URL the
 internal representation is unchanged, so the original order is preserved.
 The mutation algorithms can then do their best to preserve the list as
 reasonably as they can (eg. assigning query.a = ['5', '6'] would remove all
 'a' keys, then insert items at the location of the first removed item, or
 append if there were none).

Why hide the order?

 Is this not already supported by creating a new URL which contains
 only a relative query part?

 Like: query = new URL(?a=bc=d); query.query[a] = x;
 query.toString() == ?a=xc=d;

 Why is a new interface necessary?


 That won't work, since ?a=bc=d isn't a valid URL.

?a=bc=d is a valid URI reference. @href=?a=bc=d is valid.

 The invalid flag will
 be set, so the change to .query will be a no-op, and .href (presumably what
 toString will invoke) would return the original URL, ?a=bc=d, not
 ?a=xc=d.  You'd need to do something like:

 var query = new URL(http://example.com?; + url.hash);
 query.query.a = x;
 url.hash = query.search.slice(1); // remove the leading ?

 That's awkward, but maybe it's good enough.

This is a use case for parsing without composed relative resolution.

Re: [whatwg] New URL Standard

2012-09-25 Thread Alexandre Morgaut


On 26 sept. 2012, at 00:14, David Sheets wrote:

 On Tue, Sep 25, 2012 at 2:13 PM, Glenn Maynard gl...@zewt.org wrote:
 On Mon, Sep 24, 2012 at 7:18 PM, David Sheets kosmo...@gmail.com wrote:


 The right approach is probably to expose the results in an object-like form,
 as Tab suggests, but to store the state internally in a list-like format,
 with modifications defined in terms of mutations to the list.

Isn't it what does the Web Storage API? In which each key can be found by an 
index using the key() method:

http://www.w3.org/TR/webstorage/#dom-storage-key

My concern is just that key should probably be named getKey to avoid name 
collision with parameter names




Alexandre Morgaut
Wakanda Community Manager

4D SAS
60, rue d'Alsace
92110 Clichy
France

Standard : +33 1 40 87 92 00
Email :alexandre.morg...@4d.com
Web :  www.4D.com

Re: [whatwg] New URL Standard

2012-09-25 Thread Glenn Maynard

On Tue, Sep 25, 2012 at 5:14 PM, David Sheets kosmo...@gmail.com wrote:

 Looking up keys is easy in an association list. Filtering the list
 retains ordering. Appending to the list is well-defined. Folding into
 a dictionary is trivial and key merging can be defined according to
 the author's URL convention.


I'd suggest writing out what you mean in JavaScript or JS-like pseudocode,
demonstrating what it would actually look like to scripts and how it would
be used.  It's the quickest way to get API ideas across.

  The right approach is probably to expose the results in an object-like
 form,
  as Tab suggests, but to store the state internally in a list-like format,
  with modifications defined in terms of mutations to the list.

 This sounds more complicated to implement while maintaining
 invariants. A dictionary with an associated total order is an
 association list.


I think it's pretty straightforward both to specify and to implement.  Of
course, implementations can use any internal data structure they like as
long as the end result is the same.

 Why hide the order?


Because the natural JS interface, object-like access, doesn't allow it.  If
you think there's an API with similar convenience to an object and natural
usage in the language, then feel free to suggest it as I described above.

(Of course, a separate method could exist to get access to the underlying
order, if and when real use cases turn up that actually need it, and it's
not unlikely that there are use cases--but so far they haven't been
raised.  There's nothing wrong with exposing multiple API views into the
same data set, when they have clearly distinct goals and attempts to meet
both sets of goals with the same API fail.)

  Like: query = new URL(?a=bc=d); query.query[a] = x;
  query.toString() == ?a=xc=d;

  That won't work, since ?a=bc=d isn't a valid URL.

 ?a=bc=d is a valid URI reference. @href=?a=bc=d is valid.


It's not a valid *absolute* URL, which is what you used above.  You can
sidestep this either by prefixing it to make it into a valid URL (as I
suggested) or by specifying a base URL; they're both pretty much equivalent
here.

 This is a use case for parsing without composed relative resolution.


Maybe, but that's a pretty complicated approach for this use case.

(To summarize the mechanism he's referring to, as I understand it: the
ability to use this API to parse, modify and output relative URLs without
resolving them to a base URL at all.)

-- 
Glenn Maynard

Re: [whatwg] New URL Standard

2012-09-25 Thread Boris Zbarsky


On 9/25/12 6:53 PM, Glenn Maynard wrote:

(Of course, a separate method could exist to get access to the underlying
order, if and when real use cases turn up that actually need it, and it's
not unlikely that there are use cases--but so far they haven't been
raised.


The obvious use case is constructing a URI with a given query by hand, 
right?


-Boris

Re: [whatwg] New URL Standard

2012-09-25 Thread Glenn Maynard

On Tue, Sep 25, 2012 at 8:36 PM, Boris Zbarsky bzbar...@mit.edu wrote:

 On 9/25/12 6:53 PM, Glenn Maynard wrote:

 (Of course, a separate method could exist to get access to the underlying
 order, if and when real use cases turn up that actually need it, and it's
 not unlikely that there are use cases--but so far they haven't been
 raised.


 The obvious use case is constructing a URI with a given query by hand,
 right?


If you already have the a=1b=2 string, you can just assign it to .search
and not use the prepared-query-parameters interface at all.

-- 
Glenn Maynard

Re: [whatwg] New URL Standard

2012-09-25 Thread Boris Zbarsky


On 9/25/12 10:13 PM, Glenn Maynard wrote:

The obvious use case is constructing a URI with a given query by
hand, right?

If you already have the a=1b=2 string, you can just assign it to
.search and not use the prepared-query-parameters interface at all.


I was thinking more like you have the arrays [a, b] (hardcoded) and 
[1, 2] (provided by user).


-Boris

Re: [whatwg] New URL Standard

2012-09-25 Thread Glenn Maynard

On Tue, Sep 25, 2012 at 9:27 PM, Boris Zbarsky bzbar...@mit.edu wrote:

 On 9/25/12 10:13 PM, Glenn Maynard wrote:

 The obvious use case is constructing a URI with a given query by
 hand, right?

 If you already have the a=1b=2 string, you can just assign it to
 .search and not use the prepared-query-parameters interface at all.


 I was thinking more like you have the arrays [a, b] (hardcoded) and
 [1, 2] (provided by user).


You usually don't care about the resulting order in that case, right?
You'd just say something like

assert(key_names.length == user_data.length); // [a, b].length == [1,
2].length
for(var i = 0; i  user_data.length; ++i)
url.query[key_names[i]] = ]user_data[i];

When do you care about being able to specifically create (or distinguish)
a=1b=2 vs. b=2a=1 (or, a bit trickier, a=1b=2a=3)?

-- 
Glenn Maynard

Re: [whatwg] New URL Standard

2012-09-25 Thread Boris Zbarsky


On 9/25/12 10:36 PM, Glenn Maynard wrote:

You usually don't care about the resulting order in that case, right?


It's not uncommon for servers to depend on a particular order of 
parameters in the query string and totally fail when the ordering is 
different.  Especially the sort of servers that have a .exe for their 
CGI instead of using an off-the-shelf CGI library.



When do you care about being able to specifically create (or
distinguish) a=1b=2 vs. b=2a=1


Whenever the server will barf on one of them?  ;)

-Boris

Re: [whatwg] New URL Standard

2012-09-25 Thread Glenn Maynard

On Tue, Sep 25, 2012 at 9:53 PM, Boris Zbarsky bzbar...@mit.edu wrote:

 On 9/25/12 10:36 PM, Glenn Maynard wrote:

 You usually don't care about the resulting order in that case, right?


 It's not uncommon for servers to depend on a particular order of
 parameters in the query string and totally fail when the ordering is
 different.  Especially the sort of servers that have a .exe for their CGI
 instead of using an off-the-shelf CGI library.


  When do you care about being able to specifically create (or
 distinguish) a=1b=2 vs. b=2a=1


 Whenever the server will barf on one of them?  ;)


It's easy enough to allow creating a specific ordering of individual items,
by guaranteeing that when a key is assigned to the object, if that key
didn't already exist in the query, it will be added to the end.  That means
you can say

url.query.x = '1';
url.query.y = '2';

vs.

url.query.y = '2';
url.query.x = '1';

to create x=1y=2 and y=2x=1, respectively.  That's the behavior I'd
expect anyway.  (If the key already existed, it should replace it in its
previous position, of course, not bump it to the end.)

What this doesn't allow is creating things like a=1b=2a=3.  You can
create a=1a=2b=3 (url.query.a = [1,2]; url.query.b = 3), but
there's no way to split the keys (a, b, a).  This is the limitation we were
really talking about.  This seems unlikely to be a real problem, and in the
unlikely case where it's really needed, it seems fine to require people to
just fall back on formatting the query string themselves and assign to
url.search.

-- 
Glenn Maynard

Re: [whatwg] New URL Standard

2012-09-25 Thread Boris Zbarsky


On 9/25/12 11:15 PM, Glenn Maynard wrote:

What this doesn't allow is creating things like a=1b=2a=3


Ah.  That should be relatively unlikely (though forms with checkboxes in 
them can in fact lead to query strings like that).


-Boris

Re: [whatwg] New URL Standard

2012-09-25 Thread Ian Hickson

On Tue, 25 Sep 2012, Glenn Maynard wrote:
 
 What this doesn't allow is creating things like a=1b=2a=3.  You can 
 create a=1a=2b=3 (url.query.a = [1,2]; url.query.b = 3), but 
 there's no way to split the keys (a, b, a).  This is the limitation we 
 were really talking about.  This seems unlikely to be a real problem, 
 and in the unlikely case where it's really needed, it seems fine to 
 require people to just fall back on formatting the query string 
 themselves and assign to url.search.

You could even make that work, by having a special method for appending a 
new key/value pair, and just not making it accessible.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] New URL Standard

2012-09-24 Thread Tobie Langel

On Mon, Sep 24, 2012 at 10:58 AM, Anne van Kesteren ann...@annevk.nl wrote:
 The kind of predictability we have for the HTML parser, I want to have for the
 URL parser as well.

Yes, please!!

--tobie

On Sat, Sep 22, 2012 at 9:10 AM, Alexandre Morgaut
alexandre.morg...@4d.com wrote:
Would the URLUtil interface replace the URL decomposition IDL attributes of
the Location interface?
-
http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#url-decomposition-idl-attributes
-
http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#the-location-interface

Yes. My plan is to obsolete most URL parts of HTML.

Could the search property have a key/value mapping?
ex:
http://test.com?param1=value1
- var value1 = url.search.param1
search as window.location could still be usable as a string

I have been thinking about introducing a .query attribute that would
return a special interface for this purpose, but what the right API
should be seems somewhat tricky. Adam and Erik came up with a solution
that introduces eight new methods (see
http://dvcs.w3.org/hg/url/raw-file/tip/Overview.html#url ) but I hope
we can find something more elegant. (Unless we are stuck with their
solution for some reason, but I believe that is not the case.)

Shouldn't this document have references on some of the URL related RFCs:

The plan is to obsolete the RFCs. But yes, I will add some references
in the Goals section most likely. Similar to what has been done in the
DOM Standard.

Should this document include a more complete list of schemes with ones that
are more and more used in URLs?

Maybe, kinda depends on what turns out to be the ideal scope for the
URL Standard. For now I only wanted to include those schemes relevant
to the parser (and it may turn out there is a few more of those, e.g.
mailto, javascript, data, and file might need some special casing).

Unfortunately, the URLUtil interface would not be adapted for them:
- the protocol, host, and hostname properties make sense and would work;
- the query part (search property) is used by the mailto:; and sms: URIs;
- for tel: and fax, we see parameters prefixed by ; as the ones used
in some media types, those parameters could be found in the search property

We might not want to adapt it either because of the relative increase
in complexity while not actually addressing many use cases. You want
to modify query/path for http/https and maybe ws/wss a lot, but not so
much for mailto I'd think.

--
http://annevankesteren.nl/

Re: [whatwg] New URL Standard

2012-09-24 Thread Karl Dubost


Le 21 sept. 2012 à 17:16, Anne van Kesteren a écrit :
 I took a crack at defining URLs: http://url.spec.whatwg.org/

Very cool.



On cite attributes, I'm using urn:isbn:

blockquote cite=urn:isbn:2-7073-1038-7
   pJ'aime la liberté. J'aime être responsable 
  de mes actes. J'aime comprendre ce que je 
  fais… Et, cependant, je donne mon accord 
  à ce marché bizarre./p
/blockquote
   
Which I can use and parse with an extension in Opera [1] which convert it into 
a link to the Open Library. In the future I could give accessibilities to 
different services, and the user could choose its own reference system.

In this case.
http://openlibrary.org/books/OL8913264M/Djinn


All of that, it would be cool to be able to grab the relevant part of the URI 
without having to regex the string return by the cite attribute.

PS: and Yes I can live with not being there if you say no ;)

[1]: https://addons.opera.com/fr/extensions/details/quotelink/?display=en

-- 
Karl Dubost - http://dev.opera.com/
Developer Relations, Opera Software

Re: [whatwg] New URL Standard

2012-09-24 Thread Jukka K. Korpela


2012-09-24 12:47, Karl Dubost wrote:


On cite attributes, I'm using urn:isbn:

blockquote cite=urn:isbn:2-7073-1038-7
pJ'aime la liberté. J'aime être responsable
   de mes actes. J'aime comprendre ce que je
   fais… Et, cependant, je donne mon accord
   à ce marché bizarre./p
/blockquote

Which I can use and parse with an extension in Opera [1] which convert it
 into a link to the Open Library. In the future I could give 
accessibilities

to different services, and the user could choose its own reference system.


This is all very cool in its own way, and could be useful when used
with discipline within a discipline. But for a long time, such cool 
ideas will not be supported in most browsing situations. Yet, authors 
who know the cool idea will apply it and will fail to duplicate any 
credits in the normal visible content. This means that to most users, a 
quotation will appear without any credits or source information.


It also means that the only immediately available source information for 
a quotation will be an ISBN in URL format. So, for example, working 
offline, you won't see even the title and the author. Would the 
quotation even satisfy the legal requirements for quotations?


If the credits are additionally given in visible content, there *there* 
is the place to do cool things with ISBNs. The credits, when they 
include the ISBN in addition to author, title, etc., could have the ISBN 
part turned to an element like a href=urn:isbn:2-7073-1038-7ISBN 
2-7073-1038-7/a. (This would still suffer from lack of compatibility 
with older user agents, creating non-working links on them, so maybe 
some new markup - which would simply be ignored by old user agents - 
would be better.)


The point, however, is that the cite attribute in blockquote is broken 
by design and should not be implemented in any new ways (or old).


Yucca

Re: [whatwg] New URL Standard

2012-09-24 Thread Alexandre Morgaut


On 24 sept. 2012, at 11:34, Anne van Kesteren wrote:

 Could the search property have a key/value mapping?
 ex:
 http://test.com?param1=value1
 - var value1 = url.search.param1
 search as window.location could still be usable as a string

 I have been thinking about introducing a .query attribute that would
 return a special interface for this purpose, but what the right API
 should be seems somewhat tricky. Adam and Erik came up with a solution
 that introduces eight new methods (see
 http://dvcs.w3.org/hg/url/raw-file/tip/Overview.html#url ) but I hope
 we can find something more elegant. (Unless we are stuck with their
 solution for some reason, but I believe that is not the case.)

Yes I saw the methods, and as for XHR and its headers, I don't find them user 
friendly enough
The search property could stand as is, but I personally think that having a 
Web Storage like key/value mapping for the parameters would make the code more 
readable.
We could then have a params or parameters property with key / value mapping 
and implementing the Storage interface:
http://www.w3.org/TR/webstorage/#storage-0
Developers who are more comfortable with methods would then still be happy, and 
because of having the same interface, the learning curve would be better.

What I would love in the enhancement of parameters management, is that the 
developer should not need to take care about URL encoding of the names and 
values any more all those encoding/decoding could be done automatically, 
either with your proposed methods or using a Storage interface...


 Should this document include a more complete list of schemes with ones that 
 are more and more used in URLs?

 Maybe, kinda depends on what turns out to be the ideal scope for the
 URL Standard. For now I only wanted to include those schemes relevant
 to the parser (and it may turn out there is a few more of those, e.g.
 mailto, javascript, data, and file might need some special casing).

Going progressively makes sense

 Unfortunately, the URLUtil interface would not be adapted for them:
 - the protocol, host, and hostname properties make sense and would 
 work;
 - the query part (search property) is used by the mailto:; and sms: URIs;
 - for tel: and fax, we see parameters prefixed by ; as the ones used 
 in some media types, those parameters could be found in the search property

 We might not want to adapt it either because of the relative increase
 in complexity while not actually addressing many use cases. You want
 to modify query/path for http/https and maybe ws/wss a lot, but not so
 much for mailto I'd think.

I started my purpose saying Unfortunately..., but in the end, it looks like 
the Location/URL interface, in combination with the Storage interface should 
fit with any of the mentioned schemes. The only specificity being the format of 
the tel: parameters (it'd be great if we could update the RFC). I must say 
I'm more comfortable with the matching of this URL interface with mailto:;, 
tel:, sms:, and tv: than with data: or javascript:

Bellow some potential examples for those schemes using the URL and the Storage 
interfaces (without showing the methods)


mailto:j...@example.com?cc=b...@example.comsubject=current-issuebody=send%20current-issue%0D%0Asend%20index

{
host: j...@example.com,
hostname: j...@example.com,
href: 
j...@example.com?cc=b...@example.comsubject=current-issuebody=send%20current-issue%0D%0Asend%20index,
parameters: {
cc: b...@example.com,
subject: current-issue,
body: send current-issue\r\nsend index
}
pathname: ,
port: ,
protocol: mailto:;,
search: ?cc=b...@example.combody=hello,
}


tel:+11231231234;isub=8978

{
host: +11231231234,
hostname: +11231231234,
href: +11231231234;isub=8978,
parameters: {
isub: 8978
}
pathname: ,
port: ,
protocol: tel:,
search: 
}


sms:+15105550101?body=hello%20there

{
host: +15105550101,
hostname: +15105550101,
href: +15105550101?body=hello%20there,
parameters: {
body: hello there
}
pathname: ,
port: ,
protocol: sms:,
search: 
}


tv:west.hbo.com

{
host: west.hbo.com,
hostname: west.hbo.com,
href: west.hbo.com,
parameters: {}
pathname: ,
port: ,
protocol: tv:,
search: 
}


data:image/png;base64; 

{
host: ,
hostname: ,
href: image/png;base64; ,
parameters: {} // might include auto-generated mediaType  charset 
string parameters and base64 boolean parameter
pathname: ,
port: ,
protocol: data:,
search: 
}







Alexandre Morgaut
Wakanda Community Manager

4D SAS
60, rue d'Alsace
92110 Clichy
France

Standard : +33 1 40 87 92 00
Email :alexandre.morg...@4d.com

Re: [whatwg] New URL Standard

2012-09-24 Thread Alexandre Morgaut


On 24 sept. 2012, at 14:08, Alexandre Morgaut wrote:


 sms:+15105550101?body=hello%20there

 {
host: +15105550101,
hostname: +15105550101,
href: +15105550101?body=hello%20there,
parameters: {
body: hello there
}
pathname: ,
port: ,
protocol: sms:,
search: 
 }

ooops

it should be

search: ?body=hello%20there

of course




Alexandre Morgaut
Wakanda Community Manager

4D SAS
60, rue d'Alsace
92110 Clichy
France

Standard : +33 1 40 87 92 00
Email :alexandre.morg...@4d.com
Web :  www.4D.com

Re: [whatwg] New URL Standard

2012-09-24 Thread Karl Dubost


Le 24 sept. 2012 à 12:08, Jukka K. Korpela a écrit :
 It also means that the only immediately available source information for a 
 quotation will be an ISBN in URL format. So, for example, working offline, 
 you won't see even the title and the author. Would the quotation even satisfy 
 the legal requirements for quotations?

unrelated and orthogonal.
We are not talking about bibliographical reference model, which would by useful 
by its own.

-- 
Karl Dubost - http://dev.opera.com/
Developer Relations, Opera Software

Re: [whatwg] New URL Standard

2012-09-24 Thread Boris Zbarsky


On 9/24/12 4:58 AM, Anne van Kesteren wrote:

Say you have a href=data:test/; the concern is what e.g.
a.protocol and a.pathname would return here. For invalid URLs they
would return : and  respectively. If we treat this as a valid URL
you would get data: and test. In Gecko I get http: and . If I
make that a href=data:text/html,test/ Gecko will give meaningful
answers (well pathname is still , maybe that is okay and pathname
should only work for hierarchical URLs).


Ah, I see.

So what happens here is that Gecko treats this as an invalid URL (more 
precisely, it cannot create an internal URI object from this string). 
 I guess that's what you were getting at: that data: URLs actually have 
a concept of invalid in Gecko.  This is actually true for all schemes 
Gecko supports, in general.  For example, http://something or other 
(with the spaces) will do the same thing.


For an invalid URI, .protocol currently returns http: in Gecko.  I 
have no idea why, offhand.  It could just as easily return :.


As far as .pathname, what Gecko does is exactly what you say: .pathname 
only works on hierarchical schemes.



More general, what I want is that for *any* given input in a
href=.../, xhr.open(GET, ...), new URL(...), etc. I want to be
able to tell what the various URL components are going to be. The kind
of predictability we have for the HTML parser, I want to have for the
URL parser as well.


Yes, absolutely agreed.


(If that means handling data URLs at the layer of the URL parser
rather than a separate parser that goes over the path, as Gecko
appears to be doing, so be it.)


We could change Gecko's handling here, for what it's worth.  One reason 
for the current handling is that right now we don't even make a into a 
link unless its href is a valid URI as far as Gecko is concerned.  But 
I'm considering changing that anyway, since no one else bothers with 
such niceties and they complicate implementation a bit...



If you want constructive advice, it would be interesting to get a full list
of all the weird stuff that UAs do here so we can evaluate which parts of it
are needed and why.  I can try to produce such a list for Gecko, if there
seems to be motion on the general idea.


I think that would be a great start. I'm happy to start out with
Gecko's behavior and iterate over time as feedback comes in from other
browsers.


Hmm.  So here goes at least a partial list:

1)  On Windows and OS/2, Gecko replaces '\\' with '/' in file:// URI 
strings before doing anything else with the string when parsing a new 
URL.  That includes relative URI strings being resolved against a 
file:// base.


2)  file:// URIs are parsed as a no authority URL in Gecko.  Quoting 
the IDL comment:


35 /**
36  * blah:foo/bar= blah:///foo/bar
37  * blah:/foo/bar   = blah:///foo/bar
38  * blah://foo/bar  = blah://foo/bar
39  * blah:///foo/bar = blah:///foo/bar
40  */

where the thing on the left is the input string and the thing on the 
right is the normalized form that the parser produces from it.  Note 
that this is different from how HTTP URIs are parsed, for all except the 
item on line number 38 there.


3)  Gecko does not allow setting a username, password, hostname, port on 
an existing no authority URL object, including file://.  Attempts to 
do that throw internally; I believe for web stuff it just becomes a no-op.


4)  For no authority URLs, including file://, on Windows and OS/2 
only, if what looks like authority section looks like a drive letter, 
it's treated as part of the path.  For example, file://c:/ is treated 
as the filename c:\.  Looks like a drive letter is defined as ASCII 
letter (any case), followed by a ':' or '|' and then followed by end of 
string or '/' or '\\'.  I'm not sure why this is checking for '\\' 
again, honestly.  ;)


5)  When parsing a no authority URL (including file://), and when item 
4 above does not apply, it looks like Gecko skips everything after 
file:// up until the next '/', '?', or '#' char before parsing path stuff.


6)  On Windows and OS/2, when dynamically parsing a path for a no 
authority URL (not sure whether this is actually web-exposed, fwiw...) 
Gecko will do something involving looking for a path that's only an 
ASCII letter followed by ':' or '|' followed by end of string.  I'm not 
quite sure what that part is about...  It might have to do with the fact 
that URI objects in Gecko can have concepts of directory, filename, 
extension or something like that.


7)  When doing URI equality comparisons, if two file:// URIs only differ 
in their directory/filename/extension (so the actual file path), then an 
equality comparison is done on the underlying file path objects.  What 
this means depends on the OS.  On Unix this is just a straight-up byte 
by byte compare of file paths.  I think OS X now follows the Unix code 
path as do most other supported platforms.  But note that file path in 
this case is normalized in various ways.

Re: [whatwg] New URL Standard

2012-09-24 Thread Tab Atkins Jr.

On Mon, Sep 24, 2012 at 2:34 AM, Anne van Kesteren ann...@annevk.nl wrote:
 I have been thinking about introducing a .query attribute that would
 return a special interface for this purpose, but what the right API
 should be seems somewhat tricky. Adam and Erik came up with a solution
 that introduces eight new methods (see
 http://dvcs.w3.org/hg/url/raw-file/tip/Overview.html#url ) but I hope
 we can find something more elegant. (Unless we are stuck with their
 solution for some reason, but I believe that is not the case.)

Yeah, that interface is pretty unfriendly.

I suggest just making it a map from String-[String].  You probably
want a little bit of magic - if the setter receives an array, replace
the current value with it; anything else, stringify then wrap in an
array and replace the current value.  The getter should return an
empty array for non-existing params.  You should be able to set .query
itself with an object, which empties out the map and then runs the
setter over all the items.  Bam, every single methods is now obsolete.

~TJ

Re: [whatwg] New URL Standard

2012-09-24 Thread David Sheets

On Mon, Sep 24, 2012 at 2:34 AM, Anne van Kesteren ann...@annevk.nl wrote:
 On Sat, Sep 22, 2012 at 9:10 AM, Alexandre Morgaut alexandre.morg...@4d.com 
 wrote:
 Shouldn't this document have references on some of the URL related RFCs:

 The plan is to obsolete the RFCs. But yes, I will add some references
 in the Goals section most likely. Similar to what has been done in the
 DOM Standard.

Is there an issue with defining WHATWG-URL syntax as a grammar
extension to the URI syntax in RFC3986?

How about splitting the definition of the parsing algorithm into a
canonicalization algorithm and a separate parser for the extended
syntax? The type would be string - string with the codomain as a
valid, unique WHATWG-URL serialization. Implementations/IDL could
provide only the composition of canonicalization and parsing but
humans trying to understand the semantics of the present algorithm
would be aided by having these phases explicitly defined.

Will any means be provided to map WHATWG-URL to Internet Standard
RFC3986-URI? Is interoperability with the deployed base of URL
consumers a goal? How will those URLs in the extended syntax be mapped
into standard URIs? Will they be unrepresentable?

Thanks,

David Sheets

Re: [whatwg] New URL Standard

2012-09-24 Thread Glenn Maynard

On Mon, Sep 24, 2012 at 12:30 PM, Tab Atkins Jr. jackalm...@gmail.comwrote:

 I suggest just making it a map from String-[String].  You probably
 want a little bit of magic - if the setter receives an array, replace
 the current value with it; anything else, stringify then wrap in an
 array and replace the current value.  The getter should return an
 empty array for non-existing params.  You should be able to set .query
 itself with an object, which empties out the map and then runs the
 setter over all the items.  Bam, every single methods is now obsolete.


When should this API guarantee that it round-trips URLs cleanly (aside from
quoting differences)?  For example, maintaining order in a=1b=2a=1, and
representing things like a=1b (no '=') and ab (no key at all).

Not round-tripping URLs might have annoying side-effects, like trying to
use history.replaceState to replace the path portion of the URL, and
unexpectedly having the query part of the URL get shuffled around or
changed in other ways.

Maybe it could guarantee that the query round-trips only if the value is
never modified (only assigned via the ctor or assigning to href), but once
you modify the query, the order becomes normalized and any other
non-round-trip side effects happen.

By the way, it would also be nice for the query part of this API to be
usable in isolation.  I often put query-like strings in the hash, resulting
in URLs like 
http://example.com/server/side/path?server-side-query=1#client/side/path?client-side-query=1;,
and it would be nice to be able to work with both of these with the same
interface.  That is, query = new URLQuery(a=bc=d); query[a] = x;
query.toString() == a=xc=d;

-- 
Glenn Maynard

Re: [whatwg] New URL Standard

2012-09-24 Thread David Sheets

On Mon, Sep 24, 2012 at 4:07 PM, Glenn Maynard gl...@zewt.org wrote:
 On Mon, Sep 24, 2012 at 12:30 PM, Tab Atkins Jr. jackalm...@gmail.comwrote:

 I suggest just making it a map from String-[String].  You probably
 want a little bit of magic - if the setter receives an array, replace
 the current value with it; anything else, stringify then wrap in an
 array and replace the current value.  The getter should return an
 empty array for non-existing params.  You should be able to set .query
 itself with an object, which empties out the map and then runs the
 setter over all the items.  Bam, every single methods is now obsolete.


 When should this API guarantee that it round-trips URLs cleanly (aside from
 quoting differences)?  For example, maintaining order in a=1b=2a=1, and
 representing things like a=1b (no '=') and ab (no key at all).

Always. The appropriate interface is (string * string?) list. Id est,
an association list of keys and nullable values (null is
key-without-value and empty string is empty-value). If you prefer to
not use a nullable value and don't like tuple representations in JS,
you could use type: string list list

i.e.

[[key_without_value],[],[key,value],[],[numbers,1,2,3,4],[,],[,,]]

becomes

?key_without_valuekey=valuenumbers=1,2,3,4==,

where I've assumed that values after the second are concatenated with
commas (but it could be semicolons or some other separator).

Unfortunately, JavaScript does not have any lightweight product types
so a decision like this is necessary.

 Not round-tripping URLs might have annoying side-effects, like trying to
 use history.replaceState to replace the path portion of the URL, and
 unexpectedly having the query part of the URL get shuffled around or
 changed in other ways.

That would be unacceptably broken.

 Maybe it could guarantee that the query round-trips only if the value is
 never modified (only assigned via the ctor or assigning to href), but once
 you modify the query, the order becomes normalized and any other
 non-round-trip side effects happen.

Why can't as much information as possible be preserved? There exist
many URI manipulation libraries that support maximal preservation.

 By the way, it would also be nice for the query part of this API to be
 usable in isolation.  I often put query-like strings in the hash, resulting
 in URLs like 
 http://example.com/server/side/path?server-side-query=1#client/side/path?client-side-query=1;,
 and it would be nice to be able to work with both of these with the same
 interface.  That is, query = new URLQuery(a=bc=d); query[a] = x;
 query.toString() == a=xc=d;

Is this not already supported by creating a new URL which contains
only a relative query part?

Like: query = new URL(?a=bc=d); query.query[a] = x;
query.toString() == ?a=xc=d;

Why is a new interface necessary?

 --
 Glenn Maynard

Re: [whatwg] New URL Standard

2012-09-24 Thread Ian Hickson

On Mon, 24 Sep 2012, David Sheets wrote:
 
 Is there an issue with defining WHATWG-URL syntax as a grammar extension 
 to the URI syntax in RFC3986?

In general, BNF isn't very useful for defining the parsing rules when you 
also need to handle non-conforming content in a correct manner. Really it 
is only useful for saying whether or not content is conforming.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] New URL Standard

2012-09-24 Thread David Sheets

On Mon, Sep 24, 2012 at 5:23 PM, Ian Hickson i...@hixie.ch wrote:
 On Mon, 24 Sep 2012, David Sheets wrote:

 Is there an issue with defining WHATWG-URL syntax as a grammar extension
 to the URI syntax in RFC3986?

 In general, BNF isn't very useful for defining the parsing rules when you
 also need to handle non-conforming content in a correct manner. Really it
 is only useful for saying whether or not content is conforming.

Your conforming WHATWG-URL syntax will have production rule alphabets
which are supersets of the alphabets in RFC3986. This is what I
propose you define and it does not necessarily have to be in BNF
(though a production rule language of some sort probably isn't a bad
idea).

If you read my mail carefully, you will notice that I address the
non-conforming identifier case in the initial canonicalization
algorithm. This normalization step is separate from the syntax of
conforming WHATWG-URLs and would define how non-conforming strings are
interpreted as conforming strings. The parsing algorithm then provides
a map from these strings into a data structure.

Error recovery and extended syntax for conforming representations are
orthogonal.

How will WHATWG-URLs which use the syntax extended from RFC3986 map
into RFC3986 URI references for systems that only support those?

Re: [whatwg] New URL Standard

2012-09-24 Thread Ian Hickson


This is Anne's spec, so I'll let him give more canonical answers, but:

On Mon, 24 Sep 2012, David Sheets wrote:
 
 Your conforming WHATWG-URL syntax will have production rule alphabets 
 which are supersets of the alphabets in RFC3986.

Not necessarily, but that's certainly possible. Personally I would 
recommend that we not change the definition of what is conforming from the 
current RFC3986/RFC3987 rules, except to the extent that the character 
encoding affects it (as per the HTML standard today).

   http://whatwg.org/html#valid-url


 This is what I propose you define and it does not necessarily have to be 
 in BNF (though a production rule language of some sort probably isn't a 
 bad idea).

We should definitely define what is a conforming URL, yes (either 
directly, or by reference to the RFCs, as HTML does now). Whether prose or 
a structured language is the better way to go depends on what the 
conformance rules are -- HTML is a good example here: it has parts that 
are defined in terms of prose (e.g. the HTML syntax as a whole), and other 
parts that are defined in terms of BNF (e.g. constraints on the conetnts 
of script elements in certain situations). It's up to Anne.


 Error recovery and extended syntax for conforming representations are 
 orthogonal.

Indeed.


 How will WHATWG-URLs which use the syntax extended from RFC3986 map into 
 RFC3986 URI references for systems that only support those?

The same way that those systems handle invalid URLs today, I would assume. 
Do you have any concrete systems in mind here? It would be good to add 
them to the list of systems that we test. (For what it's worth, in 
practice, I've never found software that exactly followed RFC3986 and 
also rejected any non-conforming strings. There are just too many invalid 
URLs out there for that to be a viable implementation strategy.)

I remember when I was testing this years ago, when doing the first pass on 
attempting to fix this, that I found that some less widely tested 
software, e.g. wget(1), did not handle URLs in the same manner as more 
widely tested software, e.g. IE, with the result being that Web pages were 
not handled interoperably between these two software classes. This is the 
kind of thing we want to stop, by providing a single way to parse all 
input strings, valid or invalid, as URLs.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] New URL Standard

2012-09-23 Thread Maciej Stachowiak


Excellent work.

Did you use tests while making this and if so did you save them? It might be 
worthwhile to check all the browsers against the spec.

Cheers,
Maciej

On Sep 21, 2012, at 8:16 AM, Anne van Kesteren ann...@annevk.nl wrote:

 I took a crack at defining URLs: http://url.spec.whatwg.org/
 
 At the moment it defines parsing (minus domain names / IP addresses)
 and the JavaScript API (minus the query manipulation methods proposed
 by Adam Barth). It defines things like setting .pathname to hello
 world (notice the space), it defines what happens if you resolve
 http:test against a data URL (you get http://test/;) or
 http://teehee (you get http://teehee/test;). It is based on the
 various URL code paths found in WebKit and Gecko and supports the \ as
 / in various places because it seemed better for compatibility.
 
 I'm looking for some feedback/ideas on how to handle various aspects, e.g.:
 
 * data URLs; in Gecko these appear to be parsed as part of the URL
 layer, because they can turn a URL invalid. Other browsers do not do
 this. Opinions? Should data URLs support .search?
 * In the current text only a select few URLs support host/port/query.
 The rest is solely path/fragment. But maybe we want mailto to support
 query? Should it support host? (mailto supporting e.g. host would also
 mean normalising host via IDNA toASCII and friends. Not sure I'm fond
 of that.)
 * Advice on file URLs would be nice.
 * IDNA: what are your plans? IDNA2003 / IDNA2008 / UTS #46 / something
 else? It would be nice to get agreement on this.
 * Terminology: should we align the terminology with the API or would
 that just be too confusing?
 
 Thanks!
 
 
 PS: It also does the query encoding thing correctly for the first time
 ever in the history of URL standards although the wording can probably
 be improved.
 
 
 -- 
 http://annevankesteren.nl/

Re: [whatwg] New URL Standard

2012-09-22 Thread Alexandre Morgaut


Thanks Anne, I'd appreciate to be able to easily get a URLUtil interface from a 
string UTL without doing some nasty hacks

I have a ew questions

Would the URLUtil interface replace the URL decomposition IDL attributes of 
the Location interface?
- 
http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#url-decomposition-idl-attributes
- 
http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#the-location-interface

Could the search property have a query and/or params (see tel: and fax: 
bellow) alias?

Could the search property have a key/value mapping?
ex:
http://test.com?param1=value1
- var value1 = url.search.param1
search as window.location could still be usable as a string


Shouldn't this document have references on some of the URL related RFCs:

- Uniform Resource Locators (URL)
- http://tools.ietf.org/html/rfc1738

- The data URL scheme
- http://tools.ietf.org/html/rfc2397

- Uniform Resource Identifier (URI): Generic Syntax
- http://tools.ietf.org/html/rfc3986

Should this document include a more complete list of schemes with ones that are 
more and more used in URLs?
ex:
- mailto:;
- https://tools.ietf.org/html/rfc2368
- https://tools.ietf.org/html/rfc6068
- tel:, fax:
- https://tools.ietf.org/html/rfc2806
- https://tools.ietf.org/html/rfc3966
- sms:
- http://tools.ietf.org/html/rfc5724
- tv:
- http://tools.ietf.org/html/rfc2838
Unfortunately, the URLUtil interface would not be adapted for them:
- the protocol, host, and hostname properties make sense and would work;
- the query part (search property) is used by the mailto:; and sms: URIs;
- for tel: and fax, we see parameters prefixed by ; as the ones used in 
some media types, those parameters could be found in the search property


PS:
Note that the fax: scheme could be supported in a form or via XHR to send PDF 
documents, postcript document, HTML documents with their potential CSS print...
But that would be another discussion

On 21 sept. 2012, at 17:16, Anne van Kesteren wrote:

 I took a crack at defining URLs: http://url.spec.whatwg.org/

 At the moment it defines parsing (minus domain names / IP addresses)
 and the JavaScript API (minus the query manipulation methods proposed
 by Adam Barth). It defines things like setting .pathname to hello
 world (notice the space), it defines what happens if you resolve
 http:test against a data URL (you get http://test/;) or
 http://teehee (you get http://teehee/test;). It is based on the
 various URL code paths found in WebKit and Gecko and supports the \ as
 / in various places because it seemed better for compatibility.

 I'm looking for some feedback/ideas on how to handle various aspects, e.g.:

 * data URLs; in Gecko these appear to be parsed as part of the URL
 layer, because they can turn a URL invalid. Other browsers do not do
 this. Opinions? Should data URLs support .search?
 * In the current text only a select few URLs support host/port/query.
 The rest is solely path/fragment. But maybe we want mailto to support
 query? Should it support host? (mailto supporting e.g. host would also
 mean normalising host via IDNA toASCII and friends. Not sure I'm fond
 of that.)
 * Advice on file URLs would be nice.
 * IDNA: what are your plans? IDNA2003 / IDNA2008 / UTS #46 / something
 else? It would be nice to get agreement on this.
 * Terminology: should we align the terminology with the API or would
 that just be too confusing?

 Thanks!


 PS: It also does the query encoding thing correctly for the first time
 ever in the history of URL standards although the wording can probably
 be improved.


 --
 http://annevankesteren.nl/





Alexandre Morgaut
Wakanda Community Manager

4D SAS
60, rue d'Alsace
92110 Clichy
France

Standard : +33 1 40 87 92 00
Email :alexandre.morg...@4d.com
Web :  www.4D.com

Re: [whatwg] New URL Standard

2012-09-21 Thread Boris Zbarsky


On 9/21/12 11:16 AM, Anne van Kesteren wrote:

It is based on the
various URL code paths found in WebKit and Gecko and supports the \ as
/ in various places because it seemed better for compatibility.


Or worse, depending on your use cases...


* data URLs; in Gecko these appear to be parsed as part of the URL
layer, because they can turn a URL invalid. Other browsers do not do
this. Opinions? Should data URLs support .search?


I'm not quite sure what you mean by parsed as part of the URL layer 
here.  What's the concern?



* Advice on file URLs would be nice.


Abandon Hope All Ye Who Enter Here?  ;)

If you want constructive advice, it would be interesting to get a full 
list of all the weird stuff that UAs do here so we can evaluate which 
parts of it are needed and why.  I can try to produce such a list for 
Gecko, if there seems to be motion on the general idea.



PS: It also does the query encoding thing correctly for the first time
ever in the history of URL standards


\o/

-Boris

Re: [whatwg] New URL Standard

2012-09-21 Thread Julian Reschke

On 2012-09-21 17:16, Anne van Kesteren wrote:

I took a crack at defining URLs: http://url.spec.whatwg.org/

At the moment it defines parsing (minus domain names / IP addresses)
and the JavaScript API (minus the query manipulation methods proposed
by Adam Barth). It defines things like setting .pathname to hello
world (notice the space), it defines what happens if you resolve
http:test against a data URL (you get http://test/;) or

As per RFC 3986, Section 5.2 (Relative Resolution), the answer IMHO is
http:test.

Fetching from that URI indeed used http://test/ (just checked in
Mozilla), so it appears we have a terminology problem. It would be good
if we could avoid confusing relative reference resolution with what
you try to define here.

Note that the term resolve is widely used for what RFC 3986 Section
5.2 defines; see, for instance,
http://docs.oracle.com/javase/1.4.2/docs/api/java/net/URI.html#resolve%28java.lang.String%29.

...

http://teehee (you get http://teehee/test;). It is based on the
various URL code paths found in WebKit and Gecko and supports the \ as
/ in various places because it seemed better for compatibility.

I'm looking for some feedback/ideas on how to handle various aspects, e.g.:

* data URLs; in Gecko these appear to be parsed as part of the URL
layer, because they can turn a URL invalid. Other browsers do not do
this. Opinions? Should data URLs support .search?

...

I believe the behavior should be predictable and consistent no matter
what the URI scheme is.

Best regards, Julian

PS: and no, I don't think URL Standard is a good name for this document.

40 matches

Mail list logo