Re: [whatwg] New URL Standard
On Tue, Sep 25, 2012 at 10:25 PM, Ian Hickson i...@hixie.ch wrote: You could even make that work, by having a special method for appending a new key/value pair, and just not making it accessible. Right, other access methods, like this or a classList-like array, can always be added later. (Actually, key/value pairs appended like this would still be accessible with Tab's suggestion, it's just the resulting key order that it doesn't expose.) -- Glenn Maynard
Re: [whatwg] New URL Standard
On 25/09/2012 01:07 , Glenn Maynard wrote: On Mon, Sep 24, 2012 at 12:30 PM, Tab Atkins Jr. jackalm...@gmail.comwrote: I suggest just making it a map from String-[String]. You probably want a little bit of magic - if the setter receives an array, replace the current value with it; anything else, stringify then wrap in an array and replace the current value. The getter should return an empty array for non-existing params. You should be able to set .query itself with an object, which empties out the map and then runs the setter over all the items. Bam, every single methods is now obsolete. When should this API guarantee that it round-trips URLs cleanly (aside from quoting differences)? For example, maintaining order in a=1b=2a=1, and representing things like a=1b (no '=') and ab (no key at all). And round-tripping using ; as the separator instead of . I mention this because I've seen actual production code (more than once) that relied on this. I have no idea how common it is though. I'm guessing not too much, but probably some since it was in HTML 4.01: http://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2 Of course another option is to just not parse that into key-value pairs in the first place. By the way, it would also be nice for the query part of this API to be usable in isolation. +1 -- Robin Berjon - http://berjon.com/ - @robinberjon
Re: [whatwg] New URL Standard
On 25 sept. 2012, at 13:48, Robin Berjon wrote: On 25/09/2012 01:07 , Glenn Maynard wrote: And round-tripping using ; as the separator instead of . I mention this because I've seen actual production code (more than once) that relied on this. I have no idea how common it is though. I'm guessing not too much, but probably some since it was in HTML 4.01: http://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2 Of course another option is to just not parse that into key-value pairs in the first place. Technically ; might also be interpreted as part of the value I think considering it as separator would introduce more problems that the ones it could resolve (my 2 cents) By the way, it would also be nice for the query part of this API to be usable in isolation. +1 The query part should still be accessible via the search property Alexandre Morgaut Wakanda Community Manager 4D SAS 60, rue d'Alsace 92110 Clichy France Standard : +33 1 40 87 92 00 Email :alexandre.morg...@4d.com Web : www.4D.com
Re: [whatwg] New URL Standard
On Tue, Sep 25, 2012 at 9:48 PM, Robin Berjon ro...@w3.org wrote: On 25/09/2012 01:07 , Glenn Maynard wrote: On Mon, Sep 24, 2012 at 12:30 PM, Tab Atkins Jr. jackalm...@gmail.comwrote: I suggest just making it a map from String-[String]. You probably want a little bit of magic - if the setter receives an array, replace the current value with it; anything else, stringify then wrap in an array and replace the current value. The getter should return an empty array for non-existing params. You should be able to set .query itself with an object, which empties out the map and then runs the setter over all the items. Bam, every single methods is now obsolete. When should this API guarantee that it round-trips URLs cleanly (aside from quoting differences)? For example, maintaining order in a=1b=2a=1, and representing things like a=1b (no '=') and ab (no key at all). And round-tripping using ; as the separator instead of . I mention this because I've seen actual production code (more than once) that relied on this. I have no idea how common it is though. I'm guessing not too much, but probably some since it was in HTML 4.01: http://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2 Of course another option is to just not parse that into key-value pairs in the first place. I have also seen key-value pairs separated both by and by ;, but not in real life in quite some time. See also the discussion here: [1]. For media fragment URIs we chose to only recommend use of [2] (see section 51. is the only primary separator for name-value pairs, but some server-side languages also treat ; as a separator. ). Cheers, Silvia. [1] https://discussion.dreamhost.com/thread-134179.html [2] http://www.w3.org/TR/media-frags/
Re: [whatwg] New URL Standard
On Tue, Sep 25, 2012 at 6:18 AM, Ian Hickson i...@hixie.ch wrote: Not necessarily, but that's certainly possible. Personally I would recommend that we not change the definition of what is conforming from the current RFC3986/RFC3987 rules, except to the extent that the character encoding affects it (as per the HTML standard today). http://whatwg.org/html#valid-url FWIW, given that browsers happily do requests to servers with characters in the URL that are invalid per the RFC (they are not URL escaped) and servers handle them fine I think we should make the syntax more lenient. E.g. allowing [ and ] in the path and query component is fine I think. As for the question about why not build this on top of RFC 3986. That does not handle non-ASCII code points. RFC 3987 does, but is not a suitable start either. As shown in http://url.spec.whatwg.org/ it is quite trivial to combine parsing, resolving, and canonicalizing into a single algorithm (and deal with URI/IRI, now URL, as one). Trying to somehow patch the language in RFC 3987 to deal with the encoding problems for the query component, to deal with parsing http:example.org when there is a base URL with the same scheme versus when there isn't, etc. is way more of a hassle I think, though I am happy to be proven wrong. -- http://annevankesteren.nl/
Re: [whatwg] New URL Standard
On Mon, Sep 24, 2012 at 9:18 PM, Ian Hickson i...@hixie.ch wrote: This is Anne's spec, so I'll let him give more canonical answers, but: On Mon, 24 Sep 2012, David Sheets wrote: Your conforming WHATWG-URL syntax will have production rule alphabets which are supersets of the alphabets in RFC3986. Not necessarily, but that's certainly possible. Personally I would recommend that we not change the definition of what is conforming from the current RFC3986/RFC3987 rules, except to the extent that the character encoding affects it (as per the HTML standard today). http://whatwg.org/html#valid-url I believe the '#' character in the fragment identifier qualifies. This is what I propose you define and it does not necessarily have to be in BNF (though a production rule language of some sort probably isn't a bad idea). We should definitely define what is a conforming URL, yes (either directly, or by reference to the RFCs, as HTML does now). Whether prose or a structured language is the better way to go depends on what the conformance rules are -- HTML is a good example here: it has parts that are defined in terms of prose (e.g. the HTML syntax as a whole), and other parts that are defined in terms of BNF (e.g. constraints on the conetnts of script elements in certain situations). It's up to Anne. HTML is far larger and more compositional than URI. I am confident that, no matter what is specified in the WHATWG New URL Standard, a formal language exists which can describe the structure of conforming identifiers. If no such formal language can be described, the syntax specification is likely to be incomplete or unsound. How will WHATWG-URLs which use the syntax extended from RFC3986 map into RFC3986 URI references for systems that only support those? The same way that those systems handle invalid URLs today, I would assume. Do you have any concrete systems in mind here? It would be good to add them to the list of systems that we test. (For what it's worth, in practice, I've never found software that exactly followed RFC3986 and also rejected any non-conforming strings. There are just too many invalid URLs out there for that to be a viable implementation strategy.) It is not the rejection of incoming nonconforming reference identifiers that causes issues but rather the emission of strictly conforming identifiers by Postel's Law (Robustness Principle). I know of several URI implementations that, given a nonconforming reference identifier, will only output conforming identifiers. Indeed, the standard under discussion will behave in exactly this way. This leads to loss of information in chains of URI processors that can and will change the meaning of identifiers. I remember when I was testing this years ago, when doing the first pass on attempting to fix this, that I found that some less widely tested software, e.g. wget(1), did not handle URLs in the same manner as more widely tested software, e.g. IE, with the result being that Web pages were not handled interoperably between these two software classes. This is the kind of thing we want to stop, by providing a single way to parse all input strings, valid or invalid, as URLs. Was wget in violation of the RFC? Was IE more lenient? If every string, valid or invalid, is parseable as a URI reference, is there an algorithm to accurately extract URIs from plain text? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] New URL Standard
On Tue, Sep 25, 2012 at 8:03 AM, Anne van Kesteren ann...@annevk.nl wrote: On Tue, Sep 25, 2012 at 6:18 AM, Ian Hickson i...@hixie.ch wrote: Not necessarily, but that's certainly possible. Personally I would recommend that we not change the definition of what is conforming from the current RFC3986/RFC3987 rules, except to the extent that the character encoding affects it (as per the HTML standard today). http://whatwg.org/html#valid-url FWIW, given that browsers happily do requests to servers with characters in the URL that are invalid per the RFC (they are not URL escaped) and servers handle them fine I think we should make the syntax more lenient. E.g. allowing [ and ] in the path and query component is fine I think. I believe this would introduce ambiguity for parsing URI references. Is [::1] an authority reference or a path segment reference? As for the question about why not build this on top of RFC 3986. That does not handle non-ASCII code points. RFC 3987 does, but is not a suitable start either. As shown in http://url.spec.whatwg.org/ it is quite trivial to combine parsing, resolving, and canonicalizing into a single algorithm (and deal with URI/IRI, now URL, as one). Composition is often trivial but unenlightening. There is necessarily less information in a partially evaluated function composition than in the functions in isolation. Defining a formal language accurately and in a broadly understandable manner is nontrivial. Your task is nontrivial. Trying to somehow patch the language in RFC 3987 to deal with the encoding problems for the query component, to deal with parsing http:example.org when there is a base URL with the same scheme versus when there isn't, etc. is way more of a hassle I think, though I am happy to be proven wrong. I believe the encoding problems are handled by a normalization algorithm and parsing relative references is handled by the base scheme module. What is the acceptable trade-off between (y)our hassle and the time of technologists in the coming decades? Will you make it easier or harder for them to reconcile WHATWG-URL and Internet Standard 66 (RFC 3986)? -- http://annevankesteren.nl/
Re: [whatwg] New URL Standard
On Tue, 25 Sep 2012, David Sheets wrote: Not necessarily, but that's certainly possible. Personally I would recommend that we not change the definition of what is conforming from the current RFC3986/RFC3987 rules, except to the extent that the character encoding affects it (as per the HTML standard today). http://whatwg.org/html#valid-url I believe the '#' character in the fragment identifier qualifies. Not sure what you mean. Sounds like Anne is indeed expecting to widen the range of valid URLs though, so please disregard my comments on the matter. :-) We should definitely define what is a conforming URL, yes (either directly, or by reference to the RFCs, as HTML does now). Whether prose or a structured language is the better way to go depends on what the conformance rules are -- HTML is a good example here: it has parts that are defined in terms of prose (e.g. the HTML syntax as a whole), and other parts that are defined in terms of BNF (e.g. constraints on the conetnts of script elements in certain situations). HTML is far larger and more compositional than URI. I am confident that, no matter what is specified in the WHATWG New URL Standard, a formal language exists which can describe the structure of conforming identifiers. If no such formal language can be described, the syntax specification is likely to be incomplete or unsound. Just because it's possible to use a formal language doesn't mean it's a good idea. It depends how clear it is. In the HTML spec, there are places where I've actually used a hybrid, using BNF with some terminals defined using prose because defining them in BNF, while possible, is confusing. How will WHATWG-URLs which use the syntax extended from RFC3986 map into RFC3986 URI references for systems that only support those? The same way that those systems handle invalid URLs today, I would assume. Do you have any concrete systems in mind here? It would be good to add them to the list of systems that we test. (For what it's worth, in practice, I've never found software that exactly followed RFC3986 and also rejected any non-conforming strings. There are just too many invalid URLs out there for that to be a viable implementation strategy.) It is not the rejection of incoming nonconforming reference identifiers that causes issues but rather the emission of strictly conforming identifiers by Postel's Law (Robustness Principle). I know of several URI implementations that, given a nonconforming reference identifier, will only output conforming identifiers. Indeed, the standard under discussion will behave in exactly this way. This leads to loss of information in chains of URI processors that can and will change the meaning of identifiers. I don't really follow. If you have any concrete examples that would really help. I remember when I was testing this years ago, when doing the first pass on attempting to fix this, that I found that some less widely tested software, e.g. wget(1), did not handle URLs in the same manner as more widely tested software, e.g. IE, with the result being that Web pages were not handled interoperably between these two software classes. This is the kind of thing we want to stop, by providing a single way to parse all input strings, valid or invalid, as URLs. Was wget in violation of the RFC? Was IE more lenient? The RFC is so vague about what to do with non-conforming content that it's really hard to which was in violation or more lenient. But in any case that's the wrong way to look at it. There's legacy content, there's implementations, and there's the spec. The spec is (or should be) the most mutable of these; its goal should be to define how implementations should behave in order to make the content work interoperably amongst all of the implementations, and to define the best practice for content creators to avoid known dangers. If every string, valid or invalid, is parseable as a URI reference, is there an algorithm to accurately extract URIs from plain text? That would be an interesting thing to define, but in practice I don't think it's something implementors would care to follow. People tend to write URL fragments and expect them to be linked. For example, if I write, in an e-mail, the string google.com, people expect google.com to become a link to http://google.com/; and for the comma to be ignored. Similarly, if I have a page on an intranet server and I write intranet/ianh/plan.txt, it would be useful if that was turned into a link to the file. But there's nothing to distinguish that from me writing freezing/ice/273.23K, which isn't intended to be a URL at all. Given this, I think plain text renderers will be stuck with heuristics for some time to come. (Maybe even heuristics that involve actual DNS queries and HEAD requests to see if potential URLs are useful.) -- Ian Hickson
Re: [whatwg] New URL Standard
On Tue, Sep 25, 2012 at 8:20 PM, David Sheets kosmo...@gmail.com wrote: On Tue, Sep 25, 2012 at 8:03 AM, Anne van Kesteren ann...@annevk.nl wrote: FWIW, given that browsers happily do requests to servers with characters in the URL that are invalid per the RFC (they are not URL escaped) and servers handle them fine I think we should make the syntax more lenient. E.g. allowing [ and ] in the path and query component is fine I think. I believe this would introduce ambiguity for parsing URI references. Is [::1] an authority reference or a path segment reference? Path. As for the question about why not build this on top of RFC 3986. That does not handle non-ASCII code points. RFC 3987 does, but is not a suitable start either. As shown in http://url.spec.whatwg.org/ it is quite trivial to combine parsing, resolving, and canonicalizing into a single algorithm (and deal with URI/IRI, now URL, as one). Composition is often trivial but unenlightening. There is necessarily less information in a partially evaluated function composition than in the functions in isolation. Defining a formal language accurately and in a broadly understandable manner is nontrivial. Your task is nontrivial. I have no idea what you are talking about. What is the acceptable trade-off between (y)our hassle and the time of technologists in the coming decades? Will you make it easier or harder for them to reconcile WHATWG-URL and Internet Standard 66 (RFC 3986)? I'm not sure why I should care about STD 66. It is inaccurate, does not match implementations, and cannot be used to write new implementations that want to be compatible with content and services on the web. I am tackling those problems, and writing them down in a way we have written standards for over eight years now, which thus far has been successful. (Obviously STD 66 is a document many people value, but these people generally have not looked at the particulars or written software that deals with Location headers whose values contain spaces, etc. assuming they have a correct STD 66 implementation to begin with. If there is a document that addresses URLs on the web better, they will use that instead.) -- http://annevankesteren.nl/
Re: [whatwg] New URL Standard
On Mon, Sep 24, 2012 at 7:18 PM, David Sheets kosmo...@gmail.com wrote: Always. The appropriate interface is (string * string?) list. Id est, an association list of keys and nullable values (null is key-without-value and empty string is empty-value). If you prefer to not use a nullable value and don't like tuple representations in JS, you could use type: string list list i.e. [[key_without_value],[],[key,value],[],[numbers,1,2,3,4],[,],[,,]] This isn't an appropriate interface. It's terrible for 99.9% of use cases, where you really want dictionary-like access. The right approach is probably to expose the results in an object-like form, as Tab suggests, but to store the state internally in a list-like format, with modifications defined in terms of mutations to the list. That is, parsing a=1b=2a=3 would result in an internal representation like [('a', '1'), ('b', '2'), ('a', '3')]. When viewed from script, you see {a: ['1', '3'], 'b': ['2']}. If you serialize it right back to a URL the internal representation is unchanged, so the original order is preserved. The mutation algorithms can then do their best to preserve the list as reasonably as they can (eg. assigning query.a = ['5', '6'] would remove all 'a' keys, then insert items at the location of the first removed item, or append if there were none). Is this not already supported by creating a new URL which contains only a relative query part? Like: query = new URL(?a=bc=d); query.query[a] = x; query.toString() == ?a=xc=d; Why is a new interface necessary? That won't work, since ?a=bc=d isn't a valid URL. The invalid flag will be set, so the change to .query will be a no-op, and .href (presumably what toString will invoke) would return the original URL, ?a=bc=d, not ?a=xc=d. You'd need to do something like: var query = new URL(http://example.com?; + url.hash); query.query.a = x; url.hash = query.search.slice(1); // remove the leading ? That's awkward, but maybe it's good enough. -- Glenn Maynard
Re: [whatwg] New URL Standard
On Tue, Sep 25, 2012 at 2:13 PM, Glenn Maynard gl...@zewt.org wrote: On Mon, Sep 24, 2012 at 7:18 PM, David Sheets kosmo...@gmail.com wrote: Always. The appropriate interface is (string * string?) list. Id est, an association list of keys and nullable values (null is key-without-value and empty string is empty-value). If you prefer to not use a nullable value and don't like tuple representations in JS, you could use type: string list list i.e. [[key_without_value],[],[key,value],[],[numbers,1,2,3,4],[,],[,,]] This isn't an appropriate interface. It's terrible for 99.9% of use cases, where you really want dictionary-like access. This is the direct representation of the query string key-value convention. Looking up keys is easy in an association list. Filtering the list retains ordering. Appending to the list is well-defined. Folding into a dictionary is trivial and key merging can be defined according to the author's URL convention. The right approach is probably to expose the results in an object-like form, as Tab suggests, but to store the state internally in a list-like format, with modifications defined in terms of mutations to the list. This sounds more complicated to implement while maintaining invariants. A dictionary with an associated total order is an association list. That is, parsing a=1b=2a=3 would result in an internal representation like [('a', '1'), ('b', '2'), ('a', '3')]. When viewed from script, you see {a: ['1', '3'], 'b': ['2']}. If you serialize it right back to a URL the internal representation is unchanged, so the original order is preserved. The mutation algorithms can then do their best to preserve the list as reasonably as they can (eg. assigning query.a = ['5', '6'] would remove all 'a' keys, then insert items at the location of the first removed item, or append if there were none). Why hide the order? Is this not already supported by creating a new URL which contains only a relative query part? Like: query = new URL(?a=bc=d); query.query[a] = x; query.toString() == ?a=xc=d; Why is a new interface necessary? That won't work, since ?a=bc=d isn't a valid URL. ?a=bc=d is a valid URI reference. @href=?a=bc=d is valid. The invalid flag will be set, so the change to .query will be a no-op, and .href (presumably what toString will invoke) would return the original URL, ?a=bc=d, not ?a=xc=d. You'd need to do something like: var query = new URL(http://example.com?; + url.hash); query.query.a = x; url.hash = query.search.slice(1); // remove the leading ? That's awkward, but maybe it's good enough. This is a use case for parsing without composed relative resolution.
Re: [whatwg] New URL Standard
On 26 sept. 2012, at 00:14, David Sheets wrote: On Tue, Sep 25, 2012 at 2:13 PM, Glenn Maynard gl...@zewt.org wrote: On Mon, Sep 24, 2012 at 7:18 PM, David Sheets kosmo...@gmail.com wrote: The right approach is probably to expose the results in an object-like form, as Tab suggests, but to store the state internally in a list-like format, with modifications defined in terms of mutations to the list. Isn't it what does the Web Storage API? In which each key can be found by an index using the key() method: http://www.w3.org/TR/webstorage/#dom-storage-key My concern is just that key should probably be named getKey to avoid name collision with parameter names Alexandre Morgaut Wakanda Community Manager 4D SAS 60, rue d'Alsace 92110 Clichy France Standard : +33 1 40 87 92 00 Email :alexandre.morg...@4d.com Web : www.4D.com
Re: [whatwg] New URL Standard
On Tue, Sep 25, 2012 at 5:14 PM, David Sheets kosmo...@gmail.com wrote: Looking up keys is easy in an association list. Filtering the list retains ordering. Appending to the list is well-defined. Folding into a dictionary is trivial and key merging can be defined according to the author's URL convention. I'd suggest writing out what you mean in JavaScript or JS-like pseudocode, demonstrating what it would actually look like to scripts and how it would be used. It's the quickest way to get API ideas across. The right approach is probably to expose the results in an object-like form, as Tab suggests, but to store the state internally in a list-like format, with modifications defined in terms of mutations to the list. This sounds more complicated to implement while maintaining invariants. A dictionary with an associated total order is an association list. I think it's pretty straightforward both to specify and to implement. Of course, implementations can use any internal data structure they like as long as the end result is the same. Why hide the order? Because the natural JS interface, object-like access, doesn't allow it. If you think there's an API with similar convenience to an object and natural usage in the language, then feel free to suggest it as I described above. (Of course, a separate method could exist to get access to the underlying order, if and when real use cases turn up that actually need it, and it's not unlikely that there are use cases--but so far they haven't been raised. There's nothing wrong with exposing multiple API views into the same data set, when they have clearly distinct goals and attempts to meet both sets of goals with the same API fail.) Like: query = new URL(?a=bc=d); query.query[a] = x; query.toString() == ?a=xc=d; That won't work, since ?a=bc=d isn't a valid URL. ?a=bc=d is a valid URI reference. @href=?a=bc=d is valid. It's not a valid *absolute* URL, which is what you used above. You can sidestep this either by prefixing it to make it into a valid URL (as I suggested) or by specifying a base URL; they're both pretty much equivalent here. This is a use case for parsing without composed relative resolution. Maybe, but that's a pretty complicated approach for this use case. (To summarize the mechanism he's referring to, as I understand it: the ability to use this API to parse, modify and output relative URLs without resolving them to a base URL at all.) -- Glenn Maynard
Re: [whatwg] New URL Standard
On 9/25/12 6:53 PM, Glenn Maynard wrote: (Of course, a separate method could exist to get access to the underlying order, if and when real use cases turn up that actually need it, and it's not unlikely that there are use cases--but so far they haven't been raised. The obvious use case is constructing a URI with a given query by hand, right? -Boris
Re: [whatwg] New URL Standard
On Tue, Sep 25, 2012 at 8:36 PM, Boris Zbarsky bzbar...@mit.edu wrote: On 9/25/12 6:53 PM, Glenn Maynard wrote: (Of course, a separate method could exist to get access to the underlying order, if and when real use cases turn up that actually need it, and it's not unlikely that there are use cases--but so far they haven't been raised. The obvious use case is constructing a URI with a given query by hand, right? If you already have the a=1b=2 string, you can just assign it to .search and not use the prepared-query-parameters interface at all. -- Glenn Maynard
Re: [whatwg] New URL Standard
On 9/25/12 10:13 PM, Glenn Maynard wrote: The obvious use case is constructing a URI with a given query by hand, right? If you already have the a=1b=2 string, you can just assign it to .search and not use the prepared-query-parameters interface at all. I was thinking more like you have the arrays [a, b] (hardcoded) and [1, 2] (provided by user). -Boris
Re: [whatwg] New URL Standard
On Tue, Sep 25, 2012 at 9:27 PM, Boris Zbarsky bzbar...@mit.edu wrote: On 9/25/12 10:13 PM, Glenn Maynard wrote: The obvious use case is constructing a URI with a given query by hand, right? If you already have the a=1b=2 string, you can just assign it to .search and not use the prepared-query-parameters interface at all. I was thinking more like you have the arrays [a, b] (hardcoded) and [1, 2] (provided by user). You usually don't care about the resulting order in that case, right? You'd just say something like assert(key_names.length == user_data.length); // [a, b].length == [1, 2].length for(var i = 0; i user_data.length; ++i) url.query[key_names[i]] = ]user_data[i]; When do you care about being able to specifically create (or distinguish) a=1b=2 vs. b=2a=1 (or, a bit trickier, a=1b=2a=3)? -- Glenn Maynard
Re: [whatwg] New URL Standard
On 9/25/12 10:36 PM, Glenn Maynard wrote: You usually don't care about the resulting order in that case, right? It's not uncommon for servers to depend on a particular order of parameters in the query string and totally fail when the ordering is different. Especially the sort of servers that have a .exe for their CGI instead of using an off-the-shelf CGI library. When do you care about being able to specifically create (or distinguish) a=1b=2 vs. b=2a=1 Whenever the server will barf on one of them? ;) -Boris
Re: [whatwg] New URL Standard
On Tue, Sep 25, 2012 at 9:53 PM, Boris Zbarsky bzbar...@mit.edu wrote: On 9/25/12 10:36 PM, Glenn Maynard wrote: You usually don't care about the resulting order in that case, right? It's not uncommon for servers to depend on a particular order of parameters in the query string and totally fail when the ordering is different. Especially the sort of servers that have a .exe for their CGI instead of using an off-the-shelf CGI library. When do you care about being able to specifically create (or distinguish) a=1b=2 vs. b=2a=1 Whenever the server will barf on one of them? ;) It's easy enough to allow creating a specific ordering of individual items, by guaranteeing that when a key is assigned to the object, if that key didn't already exist in the query, it will be added to the end. That means you can say url.query.x = '1'; url.query.y = '2'; vs. url.query.y = '2'; url.query.x = '1'; to create x=1y=2 and y=2x=1, respectively. That's the behavior I'd expect anyway. (If the key already existed, it should replace it in its previous position, of course, not bump it to the end.) What this doesn't allow is creating things like a=1b=2a=3. You can create a=1a=2b=3 (url.query.a = [1,2]; url.query.b = 3), but there's no way to split the keys (a, b, a). This is the limitation we were really talking about. This seems unlikely to be a real problem, and in the unlikely case where it's really needed, it seems fine to require people to just fall back on formatting the query string themselves and assign to url.search. -- Glenn Maynard
Re: [whatwg] New URL Standard
On 9/25/12 11:15 PM, Glenn Maynard wrote: What this doesn't allow is creating things like a=1b=2a=3 Ah. That should be relatively unlikely (though forms with checkboxes in them can in fact lead to query strings like that). -Boris
Re: [whatwg] New URL Standard
On Tue, 25 Sep 2012, Glenn Maynard wrote: What this doesn't allow is creating things like a=1b=2a=3. You can create a=1a=2b=3 (url.query.a = [1,2]; url.query.b = 3), but there's no way to split the keys (a, b, a). This is the limitation we were really talking about. This seems unlikely to be a real problem, and in the unlikely case where it's really needed, it seems fine to require people to just fall back on formatting the query string themselves and assign to url.search. You could even make that work, by having a special method for appending a new key/value pair, and just not making it accessible. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] New URL Standard
On Mon, Sep 24, 2012 at 10:58 AM, Anne van Kesteren ann...@annevk.nl wrote: The kind of predictability we have for the HTML parser, I want to have for the URL parser as well. Yes, please!! --tobie
Re: [whatwg] New URL Standard
On Sat, Sep 22, 2012 at 9:10 AM, Alexandre Morgaut alexandre.morg...@4d.com wrote: Would the URLUtil interface replace the URL decomposition IDL attributes of the Location interface? - http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#url-decomposition-idl-attributes - http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#the-location-interface Yes. My plan is to obsolete most URL parts of HTML. Could the search property have a key/value mapping? ex: http://test.com?param1=value1 - var value1 = url.search.param1 search as window.location could still be usable as a string I have been thinking about introducing a .query attribute that would return a special interface for this purpose, but what the right API should be seems somewhat tricky. Adam and Erik came up with a solution that introduces eight new methods (see http://dvcs.w3.org/hg/url/raw-file/tip/Overview.html#url ) but I hope we can find something more elegant. (Unless we are stuck with their solution for some reason, but I believe that is not the case.) Shouldn't this document have references on some of the URL related RFCs: The plan is to obsolete the RFCs. But yes, I will add some references in the Goals section most likely. Similar to what has been done in the DOM Standard. Should this document include a more complete list of schemes with ones that are more and more used in URLs? Maybe, kinda depends on what turns out to be the ideal scope for the URL Standard. For now I only wanted to include those schemes relevant to the parser (and it may turn out there is a few more of those, e.g. mailto, javascript, data, and file might need some special casing). Unfortunately, the URLUtil interface would not be adapted for them: - the protocol, host, and hostname properties make sense and would work; - the query part (search property) is used by the mailto:; and sms: URIs; - for tel: and fax, we see parameters prefixed by ; as the ones used in some media types, those parameters could be found in the search property We might not want to adapt it either because of the relative increase in complexity while not actually addressing many use cases. You want to modify query/path for http/https and maybe ws/wss a lot, but not so much for mailto I'd think. -- http://annevankesteren.nl/
Re: [whatwg] New URL Standard
Le 21 sept. 2012 à 17:16, Anne van Kesteren a écrit : I took a crack at defining URLs: http://url.spec.whatwg.org/ Very cool. On cite attributes, I'm using urn:isbn: blockquote cite=urn:isbn:2-7073-1038-7 pJ'aime la liberté. J'aime être responsable de mes actes. J'aime comprendre ce que je fais… Et, cependant, je donne mon accord à ce marché bizarre./p /blockquote Which I can use and parse with an extension in Opera [1] which convert it into a link to the Open Library. In the future I could give accessibilities to different services, and the user could choose its own reference system. In this case. http://openlibrary.org/books/OL8913264M/Djinn All of that, it would be cool to be able to grab the relevant part of the URI without having to regex the string return by the cite attribute. PS: and Yes I can live with not being there if you say no ;) [1]: https://addons.opera.com/fr/extensions/details/quotelink/?display=en -- Karl Dubost - http://dev.opera.com/ Developer Relations, Opera Software
Re: [whatwg] New URL Standard
2012-09-24 12:47, Karl Dubost wrote: On cite attributes, I'm using urn:isbn: blockquote cite=urn:isbn:2-7073-1038-7 pJ'aime la liberté. J'aime être responsable de mes actes. J'aime comprendre ce que je fais… Et, cependant, je donne mon accord à ce marché bizarre./p /blockquote Which I can use and parse with an extension in Opera [1] which convert it into a link to the Open Library. In the future I could give accessibilities to different services, and the user could choose its own reference system. This is all very cool in its own way, and could be useful when used with discipline within a discipline. But for a long time, such cool ideas will not be supported in most browsing situations. Yet, authors who know the cool idea will apply it and will fail to duplicate any credits in the normal visible content. This means that to most users, a quotation will appear without any credits or source information. It also means that the only immediately available source information for a quotation will be an ISBN in URL format. So, for example, working offline, you won't see even the title and the author. Would the quotation even satisfy the legal requirements for quotations? If the credits are additionally given in visible content, there *there* is the place to do cool things with ISBNs. The credits, when they include the ISBN in addition to author, title, etc., could have the ISBN part turned to an element like a href=urn:isbn:2-7073-1038-7ISBN 2-7073-1038-7/a. (This would still suffer from lack of compatibility with older user agents, creating non-working links on them, so maybe some new markup - which would simply be ignored by old user agents - would be better.) The point, however, is that the cite attribute in blockquote is broken by design and should not be implemented in any new ways (or old). Yucca
Re: [whatwg] New URL Standard
On 24 sept. 2012, at 11:34, Anne van Kesteren wrote: Could the search property have a key/value mapping? ex: http://test.com?param1=value1 - var value1 = url.search.param1 search as window.location could still be usable as a string I have been thinking about introducing a .query attribute that would return a special interface for this purpose, but what the right API should be seems somewhat tricky. Adam and Erik came up with a solution that introduces eight new methods (see http://dvcs.w3.org/hg/url/raw-file/tip/Overview.html#url ) but I hope we can find something more elegant. (Unless we are stuck with their solution for some reason, but I believe that is not the case.) Yes I saw the methods, and as for XHR and its headers, I don't find them user friendly enough The search property could stand as is, but I personally think that having a Web Storage like key/value mapping for the parameters would make the code more readable. We could then have a params or parameters property with key / value mapping and implementing the Storage interface: http://www.w3.org/TR/webstorage/#storage-0 Developers who are more comfortable with methods would then still be happy, and because of having the same interface, the learning curve would be better. What I would love in the enhancement of parameters management, is that the developer should not need to take care about URL encoding of the names and values any more all those encoding/decoding could be done automatically, either with your proposed methods or using a Storage interface... Should this document include a more complete list of schemes with ones that are more and more used in URLs? Maybe, kinda depends on what turns out to be the ideal scope for the URL Standard. For now I only wanted to include those schemes relevant to the parser (and it may turn out there is a few more of those, e.g. mailto, javascript, data, and file might need some special casing). Going progressively makes sense Unfortunately, the URLUtil interface would not be adapted for them: - the protocol, host, and hostname properties make sense and would work; - the query part (search property) is used by the mailto:; and sms: URIs; - for tel: and fax, we see parameters prefixed by ; as the ones used in some media types, those parameters could be found in the search property We might not want to adapt it either because of the relative increase in complexity while not actually addressing many use cases. You want to modify query/path for http/https and maybe ws/wss a lot, but not so much for mailto I'd think. I started my purpose saying Unfortunately..., but in the end, it looks like the Location/URL interface, in combination with the Storage interface should fit with any of the mentioned schemes. The only specificity being the format of the tel: parameters (it'd be great if we could update the RFC). I must say I'm more comfortable with the matching of this URL interface with mailto:;, tel:, sms:, and tv: than with data: or javascript: Bellow some potential examples for those schemes using the URL and the Storage interfaces (without showing the methods) mailto:j...@example.com?cc=b...@example.comsubject=current-issuebody=send%20current-issue%0D%0Asend%20index { host: j...@example.com, hostname: j...@example.com, href: j...@example.com?cc=b...@example.comsubject=current-issuebody=send%20current-issue%0D%0Asend%20index, parameters: { cc: b...@example.com, subject: current-issue, body: send current-issue\r\nsend index } pathname: , port: , protocol: mailto:;, search: ?cc=b...@example.combody=hello, } tel:+11231231234;isub=8978 { host: +11231231234, hostname: +11231231234, href: +11231231234;isub=8978, parameters: { isub: 8978 } pathname: , port: , protocol: tel:, search: } sms:+15105550101?body=hello%20there { host: +15105550101, hostname: +15105550101, href: +15105550101?body=hello%20there, parameters: { body: hello there } pathname: , port: , protocol: sms:, search: } tv:west.hbo.com { host: west.hbo.com, hostname: west.hbo.com, href: west.hbo.com, parameters: {} pathname: , port: , protocol: tv:, search: } data:image/png;base64; { host: , hostname: , href: image/png;base64; , parameters: {} // might include auto-generated mediaType charset string parameters and base64 boolean parameter pathname: , port: , protocol: data:, search: } Alexandre Morgaut Wakanda Community Manager 4D SAS 60, rue d'Alsace 92110 Clichy France Standard : +33 1 40 87 92 00 Email :alexandre.morg...@4d.com
Re: [whatwg] New URL Standard
On 24 sept. 2012, at 14:08, Alexandre Morgaut wrote: sms:+15105550101?body=hello%20there { host: +15105550101, hostname: +15105550101, href: +15105550101?body=hello%20there, parameters: { body: hello there } pathname: , port: , protocol: sms:, search: } ooops it should be search: ?body=hello%20there of course Alexandre Morgaut Wakanda Community Manager 4D SAS 60, rue d'Alsace 92110 Clichy France Standard : +33 1 40 87 92 00 Email :alexandre.morg...@4d.com Web : www.4D.com
Re: [whatwg] New URL Standard
Le 24 sept. 2012 à 12:08, Jukka K. Korpela a écrit : It also means that the only immediately available source information for a quotation will be an ISBN in URL format. So, for example, working offline, you won't see even the title and the author. Would the quotation even satisfy the legal requirements for quotations? unrelated and orthogonal. We are not talking about bibliographical reference model, which would by useful by its own. -- Karl Dubost - http://dev.opera.com/ Developer Relations, Opera Software
Re: [whatwg] New URL Standard
On 9/24/12 4:58 AM, Anne van Kesteren wrote: Say you have a href=data:test/; the concern is what e.g. a.protocol and a.pathname would return here. For invalid URLs they would return : and respectively. If we treat this as a valid URL you would get data: and test. In Gecko I get http: and . If I make that a href=data:text/html,test/ Gecko will give meaningful answers (well pathname is still , maybe that is okay and pathname should only work for hierarchical URLs). Ah, I see. So what happens here is that Gecko treats this as an invalid URL (more precisely, it cannot create an internal URI object from this string). I guess that's what you were getting at: that data: URLs actually have a concept of invalid in Gecko. This is actually true for all schemes Gecko supports, in general. For example, http://something or other (with the spaces) will do the same thing. For an invalid URI, .protocol currently returns http: in Gecko. I have no idea why, offhand. It could just as easily return :. As far as .pathname, what Gecko does is exactly what you say: .pathname only works on hierarchical schemes. More general, what I want is that for *any* given input in a href=.../, xhr.open(GET, ...), new URL(...), etc. I want to be able to tell what the various URL components are going to be. The kind of predictability we have for the HTML parser, I want to have for the URL parser as well. Yes, absolutely agreed. (If that means handling data URLs at the layer of the URL parser rather than a separate parser that goes over the path, as Gecko appears to be doing, so be it.) We could change Gecko's handling here, for what it's worth. One reason for the current handling is that right now we don't even make a into a link unless its href is a valid URI as far as Gecko is concerned. But I'm considering changing that anyway, since no one else bothers with such niceties and they complicate implementation a bit... If you want constructive advice, it would be interesting to get a full list of all the weird stuff that UAs do here so we can evaluate which parts of it are needed and why. I can try to produce such a list for Gecko, if there seems to be motion on the general idea. I think that would be a great start. I'm happy to start out with Gecko's behavior and iterate over time as feedback comes in from other browsers. Hmm. So here goes at least a partial list: 1) On Windows and OS/2, Gecko replaces '\\' with '/' in file:// URI strings before doing anything else with the string when parsing a new URL. That includes relative URI strings being resolved against a file:// base. 2) file:// URIs are parsed as a no authority URL in Gecko. Quoting the IDL comment: 35 /** 36 * blah:foo/bar= blah:///foo/bar 37 * blah:/foo/bar = blah:///foo/bar 38 * blah://foo/bar = blah://foo/bar 39 * blah:///foo/bar = blah:///foo/bar 40 */ where the thing on the left is the input string and the thing on the right is the normalized form that the parser produces from it. Note that this is different from how HTTP URIs are parsed, for all except the item on line number 38 there. 3) Gecko does not allow setting a username, password, hostname, port on an existing no authority URL object, including file://. Attempts to do that throw internally; I believe for web stuff it just becomes a no-op. 4) For no authority URLs, including file://, on Windows and OS/2 only, if what looks like authority section looks like a drive letter, it's treated as part of the path. For example, file://c:/ is treated as the filename c:\. Looks like a drive letter is defined as ASCII letter (any case), followed by a ':' or '|' and then followed by end of string or '/' or '\\'. I'm not sure why this is checking for '\\' again, honestly. ;) 5) When parsing a no authority URL (including file://), and when item 4 above does not apply, it looks like Gecko skips everything after file:// up until the next '/', '?', or '#' char before parsing path stuff. 6) On Windows and OS/2, when dynamically parsing a path for a no authority URL (not sure whether this is actually web-exposed, fwiw...) Gecko will do something involving looking for a path that's only an ASCII letter followed by ':' or '|' followed by end of string. I'm not quite sure what that part is about... It might have to do with the fact that URI objects in Gecko can have concepts of directory, filename, extension or something like that. 7) When doing URI equality comparisons, if two file:// URIs only differ in their directory/filename/extension (so the actual file path), then an equality comparison is done on the underlying file path objects. What this means depends on the OS. On Unix this is just a straight-up byte by byte compare of file paths. I think OS X now follows the Unix code path as do most other supported platforms. But note that file path in this case is normalized in various ways.
Re: [whatwg] New URL Standard
On Mon, Sep 24, 2012 at 2:34 AM, Anne van Kesteren ann...@annevk.nl wrote: I have been thinking about introducing a .query attribute that would return a special interface for this purpose, but what the right API should be seems somewhat tricky. Adam and Erik came up with a solution that introduces eight new methods (see http://dvcs.w3.org/hg/url/raw-file/tip/Overview.html#url ) but I hope we can find something more elegant. (Unless we are stuck with their solution for some reason, but I believe that is not the case.) Yeah, that interface is pretty unfriendly. I suggest just making it a map from String-[String]. You probably want a little bit of magic - if the setter receives an array, replace the current value with it; anything else, stringify then wrap in an array and replace the current value. The getter should return an empty array for non-existing params. You should be able to set .query itself with an object, which empties out the map and then runs the setter over all the items. Bam, every single methods is now obsolete. ~TJ
Re: [whatwg] New URL Standard
On Mon, Sep 24, 2012 at 2:34 AM, Anne van Kesteren ann...@annevk.nl wrote: On Sat, Sep 22, 2012 at 9:10 AM, Alexandre Morgaut alexandre.morg...@4d.com wrote: Shouldn't this document have references on some of the URL related RFCs: The plan is to obsolete the RFCs. But yes, I will add some references in the Goals section most likely. Similar to what has been done in the DOM Standard. Is there an issue with defining WHATWG-URL syntax as a grammar extension to the URI syntax in RFC3986? How about splitting the definition of the parsing algorithm into a canonicalization algorithm and a separate parser for the extended syntax? The type would be string - string with the codomain as a valid, unique WHATWG-URL serialization. Implementations/IDL could provide only the composition of canonicalization and parsing but humans trying to understand the semantics of the present algorithm would be aided by having these phases explicitly defined. Will any means be provided to map WHATWG-URL to Internet Standard RFC3986-URI? Is interoperability with the deployed base of URL consumers a goal? How will those URLs in the extended syntax be mapped into standard URIs? Will they be unrepresentable? Thanks, David Sheets
Re: [whatwg] New URL Standard
On Mon, Sep 24, 2012 at 12:30 PM, Tab Atkins Jr. jackalm...@gmail.comwrote: I suggest just making it a map from String-[String]. You probably want a little bit of magic - if the setter receives an array, replace the current value with it; anything else, stringify then wrap in an array and replace the current value. The getter should return an empty array for non-existing params. You should be able to set .query itself with an object, which empties out the map and then runs the setter over all the items. Bam, every single methods is now obsolete. When should this API guarantee that it round-trips URLs cleanly (aside from quoting differences)? For example, maintaining order in a=1b=2a=1, and representing things like a=1b (no '=') and ab (no key at all). Not round-tripping URLs might have annoying side-effects, like trying to use history.replaceState to replace the path portion of the URL, and unexpectedly having the query part of the URL get shuffled around or changed in other ways. Maybe it could guarantee that the query round-trips only if the value is never modified (only assigned via the ctor or assigning to href), but once you modify the query, the order becomes normalized and any other non-round-trip side effects happen. By the way, it would also be nice for the query part of this API to be usable in isolation. I often put query-like strings in the hash, resulting in URLs like http://example.com/server/side/path?server-side-query=1#client/side/path?client-side-query=1;, and it would be nice to be able to work with both of these with the same interface. That is, query = new URLQuery(a=bc=d); query[a] = x; query.toString() == a=xc=d; -- Glenn Maynard
Re: [whatwg] New URL Standard
On Mon, Sep 24, 2012 at 4:07 PM, Glenn Maynard gl...@zewt.org wrote: On Mon, Sep 24, 2012 at 12:30 PM, Tab Atkins Jr. jackalm...@gmail.comwrote: I suggest just making it a map from String-[String]. You probably want a little bit of magic - if the setter receives an array, replace the current value with it; anything else, stringify then wrap in an array and replace the current value. The getter should return an empty array for non-existing params. You should be able to set .query itself with an object, which empties out the map and then runs the setter over all the items. Bam, every single methods is now obsolete. When should this API guarantee that it round-trips URLs cleanly (aside from quoting differences)? For example, maintaining order in a=1b=2a=1, and representing things like a=1b (no '=') and ab (no key at all). Always. The appropriate interface is (string * string?) list. Id est, an association list of keys and nullable values (null is key-without-value and empty string is empty-value). If you prefer to not use a nullable value and don't like tuple representations in JS, you could use type: string list list i.e. [[key_without_value],[],[key,value],[],[numbers,1,2,3,4],[,],[,,]] becomes ?key_without_valuekey=valuenumbers=1,2,3,4==, where I've assumed that values after the second are concatenated with commas (but it could be semicolons or some other separator). Unfortunately, JavaScript does not have any lightweight product types so a decision like this is necessary. Not round-tripping URLs might have annoying side-effects, like trying to use history.replaceState to replace the path portion of the URL, and unexpectedly having the query part of the URL get shuffled around or changed in other ways. That would be unacceptably broken. Maybe it could guarantee that the query round-trips only if the value is never modified (only assigned via the ctor or assigning to href), but once you modify the query, the order becomes normalized and any other non-round-trip side effects happen. Why can't as much information as possible be preserved? There exist many URI manipulation libraries that support maximal preservation. By the way, it would also be nice for the query part of this API to be usable in isolation. I often put query-like strings in the hash, resulting in URLs like http://example.com/server/side/path?server-side-query=1#client/side/path?client-side-query=1;, and it would be nice to be able to work with both of these with the same interface. That is, query = new URLQuery(a=bc=d); query[a] = x; query.toString() == a=xc=d; Is this not already supported by creating a new URL which contains only a relative query part? Like: query = new URL(?a=bc=d); query.query[a] = x; query.toString() == ?a=xc=d; Why is a new interface necessary? -- Glenn Maynard
Re: [whatwg] New URL Standard
On Mon, 24 Sep 2012, David Sheets wrote: Is there an issue with defining WHATWG-URL syntax as a grammar extension to the URI syntax in RFC3986? In general, BNF isn't very useful for defining the parsing rules when you also need to handle non-conforming content in a correct manner. Really it is only useful for saying whether or not content is conforming. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] New URL Standard
On Mon, Sep 24, 2012 at 5:23 PM, Ian Hickson i...@hixie.ch wrote: On Mon, 24 Sep 2012, David Sheets wrote: Is there an issue with defining WHATWG-URL syntax as a grammar extension to the URI syntax in RFC3986? In general, BNF isn't very useful for defining the parsing rules when you also need to handle non-conforming content in a correct manner. Really it is only useful for saying whether or not content is conforming. Your conforming WHATWG-URL syntax will have production rule alphabets which are supersets of the alphabets in RFC3986. This is what I propose you define and it does not necessarily have to be in BNF (though a production rule language of some sort probably isn't a bad idea). If you read my mail carefully, you will notice that I address the non-conforming identifier case in the initial canonicalization algorithm. This normalization step is separate from the syntax of conforming WHATWG-URLs and would define how non-conforming strings are interpreted as conforming strings. The parsing algorithm then provides a map from these strings into a data structure. Error recovery and extended syntax for conforming representations are orthogonal. How will WHATWG-URLs which use the syntax extended from RFC3986 map into RFC3986 URI references for systems that only support those?
Re: [whatwg] New URL Standard
This is Anne's spec, so I'll let him give more canonical answers, but: On Mon, 24 Sep 2012, David Sheets wrote: Your conforming WHATWG-URL syntax will have production rule alphabets which are supersets of the alphabets in RFC3986. Not necessarily, but that's certainly possible. Personally I would recommend that we not change the definition of what is conforming from the current RFC3986/RFC3987 rules, except to the extent that the character encoding affects it (as per the HTML standard today). http://whatwg.org/html#valid-url This is what I propose you define and it does not necessarily have to be in BNF (though a production rule language of some sort probably isn't a bad idea). We should definitely define what is a conforming URL, yes (either directly, or by reference to the RFCs, as HTML does now). Whether prose or a structured language is the better way to go depends on what the conformance rules are -- HTML is a good example here: it has parts that are defined in terms of prose (e.g. the HTML syntax as a whole), and other parts that are defined in terms of BNF (e.g. constraints on the conetnts of script elements in certain situations). It's up to Anne. Error recovery and extended syntax for conforming representations are orthogonal. Indeed. How will WHATWG-URLs which use the syntax extended from RFC3986 map into RFC3986 URI references for systems that only support those? The same way that those systems handle invalid URLs today, I would assume. Do you have any concrete systems in mind here? It would be good to add them to the list of systems that we test. (For what it's worth, in practice, I've never found software that exactly followed RFC3986 and also rejected any non-conforming strings. There are just too many invalid URLs out there for that to be a viable implementation strategy.) I remember when I was testing this years ago, when doing the first pass on attempting to fix this, that I found that some less widely tested software, e.g. wget(1), did not handle URLs in the same manner as more widely tested software, e.g. IE, with the result being that Web pages were not handled interoperably between these two software classes. This is the kind of thing we want to stop, by providing a single way to parse all input strings, valid or invalid, as URLs. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] New URL Standard
Excellent work. Did you use tests while making this and if so did you save them? It might be worthwhile to check all the browsers against the spec. Cheers, Maciej On Sep 21, 2012, at 8:16 AM, Anne van Kesteren ann...@annevk.nl wrote: I took a crack at defining URLs: http://url.spec.whatwg.org/ At the moment it defines parsing (minus domain names / IP addresses) and the JavaScript API (minus the query manipulation methods proposed by Adam Barth). It defines things like setting .pathname to hello world (notice the space), it defines what happens if you resolve http:test against a data URL (you get http://test/;) or http://teehee (you get http://teehee/test;). It is based on the various URL code paths found in WebKit and Gecko and supports the \ as / in various places because it seemed better for compatibility. I'm looking for some feedback/ideas on how to handle various aspects, e.g.: * data URLs; in Gecko these appear to be parsed as part of the URL layer, because they can turn a URL invalid. Other browsers do not do this. Opinions? Should data URLs support .search? * In the current text only a select few URLs support host/port/query. The rest is solely path/fragment. But maybe we want mailto to support query? Should it support host? (mailto supporting e.g. host would also mean normalising host via IDNA toASCII and friends. Not sure I'm fond of that.) * Advice on file URLs would be nice. * IDNA: what are your plans? IDNA2003 / IDNA2008 / UTS #46 / something else? It would be nice to get agreement on this. * Terminology: should we align the terminology with the API or would that just be too confusing? Thanks! PS: It also does the query encoding thing correctly for the first time ever in the history of URL standards although the wording can probably be improved. -- http://annevankesteren.nl/
Re: [whatwg] New URL Standard
Thanks Anne, I'd appreciate to be able to easily get a URLUtil interface from a string UTL without doing some nasty hacks I have a ew questions Would the URLUtil interface replace the URL decomposition IDL attributes of the Location interface? - http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#url-decomposition-idl-attributes - http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#the-location-interface Could the search property have a query and/or params (see tel: and fax: bellow) alias? Could the search property have a key/value mapping? ex: http://test.com?param1=value1 - var value1 = url.search.param1 search as window.location could still be usable as a string Shouldn't this document have references on some of the URL related RFCs: - Uniform Resource Locators (URL) - http://tools.ietf.org/html/rfc1738 - The data URL scheme - http://tools.ietf.org/html/rfc2397 - Uniform Resource Identifier (URI): Generic Syntax - http://tools.ietf.org/html/rfc3986 Should this document include a more complete list of schemes with ones that are more and more used in URLs? ex: - mailto:; - https://tools.ietf.org/html/rfc2368 - https://tools.ietf.org/html/rfc6068 - tel:, fax: - https://tools.ietf.org/html/rfc2806 - https://tools.ietf.org/html/rfc3966 - sms: - http://tools.ietf.org/html/rfc5724 - tv: - http://tools.ietf.org/html/rfc2838 Unfortunately, the URLUtil interface would not be adapted for them: - the protocol, host, and hostname properties make sense and would work; - the query part (search property) is used by the mailto:; and sms: URIs; - for tel: and fax, we see parameters prefixed by ; as the ones used in some media types, those parameters could be found in the search property PS: Note that the fax: scheme could be supported in a form or via XHR to send PDF documents, postcript document, HTML documents with their potential CSS print... But that would be another discussion On 21 sept. 2012, at 17:16, Anne van Kesteren wrote: I took a crack at defining URLs: http://url.spec.whatwg.org/ At the moment it defines parsing (minus domain names / IP addresses) and the JavaScript API (minus the query manipulation methods proposed by Adam Barth). It defines things like setting .pathname to hello world (notice the space), it defines what happens if you resolve http:test against a data URL (you get http://test/;) or http://teehee (you get http://teehee/test;). It is based on the various URL code paths found in WebKit and Gecko and supports the \ as / in various places because it seemed better for compatibility. I'm looking for some feedback/ideas on how to handle various aspects, e.g.: * data URLs; in Gecko these appear to be parsed as part of the URL layer, because they can turn a URL invalid. Other browsers do not do this. Opinions? Should data URLs support .search? * In the current text only a select few URLs support host/port/query. The rest is solely path/fragment. But maybe we want mailto to support query? Should it support host? (mailto supporting e.g. host would also mean normalising host via IDNA toASCII and friends. Not sure I'm fond of that.) * Advice on file URLs would be nice. * IDNA: what are your plans? IDNA2003 / IDNA2008 / UTS #46 / something else? It would be nice to get agreement on this. * Terminology: should we align the terminology with the API or would that just be too confusing? Thanks! PS: It also does the query encoding thing correctly for the first time ever in the history of URL standards although the wording can probably be improved. -- http://annevankesteren.nl/ Alexandre Morgaut Wakanda Community Manager 4D SAS 60, rue d'Alsace 92110 Clichy France Standard : +33 1 40 87 92 00 Email :alexandre.morg...@4d.com Web : www.4D.com
Re: [whatwg] New URL Standard
On 9/21/12 11:16 AM, Anne van Kesteren wrote: It is based on the various URL code paths found in WebKit and Gecko and supports the \ as / in various places because it seemed better for compatibility. Or worse, depending on your use cases... * data URLs; in Gecko these appear to be parsed as part of the URL layer, because they can turn a URL invalid. Other browsers do not do this. Opinions? Should data URLs support .search? I'm not quite sure what you mean by parsed as part of the URL layer here. What's the concern? * Advice on file URLs would be nice. Abandon Hope All Ye Who Enter Here? ;) If you want constructive advice, it would be interesting to get a full list of all the weird stuff that UAs do here so we can evaluate which parts of it are needed and why. I can try to produce such a list for Gecko, if there seems to be motion on the general idea. PS: It also does the query encoding thing correctly for the first time ever in the history of URL standards \o/ -Boris
Re: [whatwg] New URL Standard
On 2012-09-21 17:16, Anne van Kesteren wrote: I took a crack at defining URLs: http://url.spec.whatwg.org/ At the moment it defines parsing (minus domain names / IP addresses) and the JavaScript API (minus the query manipulation methods proposed by Adam Barth). It defines things like setting .pathname to hello world (notice the space), it defines what happens if you resolve http:test against a data URL (you get http://test/;) or As per RFC 3986, Section 5.2 (Relative Resolution), the answer IMHO is http:test. Fetching from that URI indeed used http://test/ (just checked in Mozilla), so it appears we have a terminology problem. It would be good if we could avoid confusing relative reference resolution with what you try to define here. Note that the term resolve is widely used for what RFC 3986 Section 5.2 defines; see, for instance, http://docs.oracle.com/javase/1.4.2/docs/api/java/net/URI.html#resolve%28java.lang.String%29. ... http://teehee (you get http://teehee/test;). It is based on the various URL code paths found in WebKit and Gecko and supports the \ as / in various places because it seemed better for compatibility. I'm looking for some feedback/ideas on how to handle various aspects, e.g.: * data URLs; in Gecko these appear to be parsed as part of the URL layer, because they can turn a URL invalid. Other browsers do not do this. Opinions? Should data URLs support .search? ... I believe the behavior should be predictable and consistent no matter what the URI scheme is. Best regards, Julian PS: and no, I don't think URL Standard is a good name for this document.