Re: [whatwg] [URL] Starting work on a URL spec
On Tue, 03 Aug 2010, Adam Barth w...@adambarth.com wrote: On Tue, Aug 3, 2010 at 8:21 AM, bjartur svartma...@gmail.com wrote: On 7/25/10 8:57 AM, Adam Barth wrote: It may not be an _html_ interoperability problem, but it's certainly a _web_ interoperability problem. It's a question of how HTTP messages are encoded (and in special the enco= ding of the IRI). WHATWG does not specify HTTP, these concerns should be directed to IETF. There are various ways to spec lawyer things so you can make this work appear to be the responsibility of various folks. The work needs to be done. I'm inclined to do the work first and worry about what organization (if any) has jurisdiction later. Yeah, true. I've been through a repetive ask the county ask school authorities, ask the county when asking my school to implement a SHOULD from national gov. *shrugs* But really, you should discuss this with the HTTP WG of IETF by raising the issue in http...@hplb.hp.com. I recommend searching the archives, http://www.ics.uci.edu/pub/ietf/http/hypermail, for counter-arguments before posting as this issue has probably be raised before. Then someone should fork RFC 2616 (or the latest working draft, if there's a current one). Patching the RFC == doing the work (good lucking getting consensus on your side if you don't provide rationale, don't defend your decisions and ignore the IETF though)
Re: [whatwg] [URL] Starting work on a URL spec
On 7/25/10 8:57 AM, Adam Barth wrote: There's also the related question of what browsers should do with input typed into the URL field. Other than establishing that these rules may be different between the URL field and URLs present in content, I'm not sure this is amenable to spec. But perhaps a survey of what browsers do would be useful. I wasn't planning to cover that because it's not a critical to interoperability Unfortunately, it is. In particular, servers need to know what to expect the browser to send if a user types non-ASCII into the url bar. There are real interoperability problems out there due to differing server and browser behavior in this regard. It may not be an _html_ interoperability problem, but it's certainly a _web_ interoperability problem. It's a question of how HTTP messages are encoded (and in special the encoding of the IRI). WHATWG does not specify HTTP, these concerns should be directed to IETF.
Re: [whatwg] [URL] Starting work on a URL spec
On Tue, Aug 3, 2010 at 8:21 AM, bjartur svartma...@gmail.com wrote: On 7/25/10 8:57 AM, Adam Barth wrote: There's also the related question of what browsers should do with input typed into the URL field. Other than establishing that these rules may be different between the URL field and URLs present in content, I'm not sure this is amenable to spec. But perhaps a survey of what browsers do would be useful. I wasn't planning to cover that because it's not a critical to interoperability Unfortunately, it is. In particular, servers need to know what to expect the browser to send if a user types non-ASCII into the url bar. There are real interoperability problems out there due to differing server and browser behavior in this regard. It may not be an _html_ interoperability problem, but it's certainly a _web_ interoperability problem. It's a question of how HTTP messages are encoded (and in special the encoding of the IRI). WHATWG does not specify HTTP, these concerns should be directed to IETF. There are various ways to spec lawyer things so you can make this work appear to be the responsibility of various folks. The work needs to be done. I'm inclined to do the work first and worry about what organization (if any) has jurisdiction later. Adam
Re: [whatwg] [URL] Starting work on a URL spec
On Jul 25, 2010, at 5:57 AM, Adam Barth wrote: 2010/7/24 Maciej Stachowiak m...@apple.com: On Jul 24, 2010, at 9:55 AM, Adam Barth wrote: 2010/7/23 Ian Fette (イアンフェッティ) ife...@google.com: http://code.google.com/apis/safebrowsing/developers_guide_v2.html#Canonicalization lists some interesting cases we've come across on the anti-phishing team in Google. To the extent you're concerned with / interested in canonicalizaiton, it may be worth taking a look at (not to suggest you follow that in determining how to parse/canonicalize URLs, but rather to make sure that you have some correct way of handling the listed URLs). Thanks. That's helpful. BTW, are you covering canonicalization? Yes. The three main things I'm hoping to cover are parsing, canonicalization, and resolving relative URLs. Is there any place in the Web platform where canonicalize is exposed by itself in a Web-facing way? I think resolve against a base and parse into components are the only algorithms whose effects can be observed directly. I think we only need to spec canonicalize if it turns out to be a useful subroutine. As far as I know, you can only see f(x) = canonicalize(parse(resolve(x))) and also some breakdown components of f(x) in HTMLAnchorElement and window.location.hash (and friends). Conceptually, it's a bit easier to think about them as three separate functions. The main difference between parse and canonicalize is that parse segments the input and canonicalize takes the segments, mutates them, and assembles them into a new string. I haven't studied resolve in as much detail yet, so I'm less clear how that fits into the puzzle. I would consider canonicalize() to be part of resolve(). Every time you retrieve a cooked URL (as opposed to original source text), you both resolve it against a possible base and canonicalize it as a single step. The two are not exposed separately. It's not clear to me that making this operation into three separate steps with a parse in the middle is helpful, or even representative of a good implementation strategy. I would think of parse() as something that happens after canonicalization in the cases where single components of the URL are exposed. Regards, Maciej
Re: [whatwg] [URL] Starting work on a URL spec
On Jul 25, 2010, at 6:43 AM, Boris Zbarsky wrote: On 7/25/10 8:57 AM, Adam Barth wrote: There's also the related question of what browsers should do with input typed into the URL field. Other than establishing that these rules may be different between the URL field and URLs present in content, I'm not sure this is amenable to spec. But perhaps a survey of what browsers do would be useful. I wasn't planning to cover that because it's not a critical to interoperability Unfortunately, it is. In particular, servers need to know what to expect the browser to send if a user types non-ASCII into the url bar. There are real interoperability problems out there due to differing server and browser behavior in this regard. It may not be an _html_ interoperability problem, but it's certainly a _web_ interoperability problem. There are also other considerations there because the URLs are displayed to users as security indicators. What's displayed is not a concern, in my opinion, in terms of interoperability. What's put on the wire is. The constraints that need to be imposed are much looser than on a href (e.g. we don't need to define exactly what url gets loaded if the user types monkey in the url bar), but sorting out the non-ASCII issue is definitely desirable. One thing to keep in mind is that browsers do all sorts of non-interoperable things for input that is not a valid URL, such as guessing that it is a hostname or performing a search with a search engine. So there's a limit to how much this can be spec'd. I agree that for certain URL-like strings that a user may type or cut paste, there is an interop issue. Regards, Maciej
Re: [whatwg] [URL] Starting work on a URL spec
2010/7/26 Maciej Stachowiak m...@apple.com: On Jul 25, 2010, at 5:57 AM, Adam Barth wrote: 2010/7/24 Maciej Stachowiak m...@apple.com: On Jul 24, 2010, at 9:55 AM, Adam Barth wrote: 2010/7/23 Ian Fette (イアンフェッティ) ife...@google.com: http://code.google.com/apis/safebrowsing/developers_guide_v2.html#Canonicalization lists some interesting cases we've come across on the anti-phishing team in Google. To the extent you're concerned with / interested in canonicalizaiton, it may be worth taking a look at (not to suggest you follow that in determining how to parse/canonicalize URLs, but rather to make sure that you have some correct way of handling the listed URLs). Thanks. That's helpful. BTW, are you covering canonicalization? Yes. The three main things I'm hoping to cover are parsing, canonicalization, and resolving relative URLs. Is there any place in the Web platform where canonicalize is exposed by itself in a Web-facing way? I think resolve against a base and parse into components are the only algorithms whose effects can be observed directly. I think we only need to spec canonicalize if it turns out to be a useful subroutine. As far as I know, you can only see f(x) = canonicalize(parse(resolve(x))) and also some breakdown components of f(x) in HTMLAnchorElement and window.location.hash (and friends). Conceptually, it's a bit easier to think about them as three separate functions. The main difference between parse and canonicalize is that parse segments the input and canonicalize takes the segments, mutates them, and assembles them into a new string. I haven't studied resolve in as much detail yet, so I'm less clear how that fits into the puzzle. I would consider canonicalize() to be part of resolve(). Every time you retrieve a cooked URL (as opposed to original source text), you both resolve it against a possible base and canonicalize it as a single step. The two are not exposed separately. It's not clear to me that making this operation into three separate steps with a parse in the middle is helpful, or even representative of a good implementation strategy. I would think of parse() as something that happens after canonicalization in the cases where single components of the URL are exposed. That's an interesting way to think about what's going on. Different parts of the URL get different canonicalization transformations applied to them. For example, the range of characters that make sense in a host name are different than those that make sense in a port or query, so, in some sense, the canonicalization algorithm needs to understand something about how the URL parses, or at least how to distinguish host names from, e.g., ports and queries. Adam
Re: [whatwg] [URL] Starting work on a URL spec
On Jul 25, 2010, at 11:16 PM, Adam Barth wrote: 2010/7/26 Maciej Stachowiak m...@apple.com: On Jul 25, 2010, at 5:57 AM, Adam Barth wrote: 2010/7/24 Maciej Stachowiak m...@apple.com: On Jul 24, 2010, at 9:55 AM, Adam Barth wrote: 2010/7/23 Ian Fette (イアンフェッティ) ife...@google.com: http://code.google.com/apis/safebrowsing/developers_guide_v2.html#Canonicalization lists some interesting cases we've come across on the anti-phishing team in Google. To the extent you're concerned with / interested in canonicalizaiton, it may be worth taking a look at (not to suggest you follow that in determining how to parse/canonicalize URLs, but rather to make sure that you have some correct way of handling the listed URLs). Thanks. That's helpful. BTW, are you covering canonicalization? Yes. The three main things I'm hoping to cover are parsing, canonicalization, and resolving relative URLs. Is there any place in the Web platform where canonicalize is exposed by itself in a Web-facing way? I think resolve against a base and parse into components are the only algorithms whose effects can be observed directly. I think we only need to spec canonicalize if it turns out to be a useful subroutine. As far as I know, you can only see f(x) = canonicalize(parse(resolve(x))) and also some breakdown components of f(x) in HTMLAnchorElement and window.location.hash (and friends). Conceptually, it's a bit easier to think about them as three separate functions. The main difference between parse and canonicalize is that parse segments the input and canonicalize takes the segments, mutates them, and assembles them into a new string. I haven't studied resolve in as much detail yet, so I'm less clear how that fits into the puzzle. I would consider canonicalize() to be part of resolve(). Every time you retrieve a cooked URL (as opposed to original source text), you both resolve it against a possible base and canonicalize it as a single step. The two are not exposed separately. It's not clear to me that making this operation into three separate steps with a parse in the middle is helpful, or even representative of a good implementation strategy. I would think of parse() as something that happens after canonicalization in the cases where single components of the URL are exposed. That's an interesting way to think about what's going on. Different parts of the URL get different canonicalization transformations applied to them. For example, the range of characters that make sense in a host name are different than those that make sense in a port or query, so, in some sense, the canonicalization algorithm needs to understand something about how the URL parses, or at least how to distinguish host names from, e.g., ports and queries. Yes, but the relative resolution algorithm needs to find URL part boundaries as well. I guess part of the issue here is that we have two different senses of parse: (1) Find the URL component boundaries in a source string, to be used by other algorithms for reference purposes. In that sense, you may need to do it to both the base URL and the possibly-relative reference before resolve(). However, this step isn't really exposed directly to the Web. (2) Extract URL components of a resolved canonicalized URL, with the appropriate post-processing to expose them via APIs like Location and HTMLAnchorElement. I've been thinking of parse() in sense #2, since that is the version actually exposed as API. You can think of this as taking a resolved canonicalized URL as input, and having a tuple of strings representing the components as output. The only other public operation is resolve+canonicalize, which conceptually takes a base URL, a possibly relative URL reference, and an optional document encoding as input, and which produces the resolved canonicalized URL as output. While there are other ways to factor these operations, using a different approach will make it less obvious how to glue them to the relevant other specs. Regards, Maciej
Re: [whatwg] [URL] Starting work on a URL spec
2010/7/24 Maciej Stachowiak m...@apple.com: On Jul 24, 2010, at 9:55 AM, Adam Barth wrote: 2010/7/23 Ian Fette (イアンフェッティ) ife...@google.com: http://code.google.com/apis/safebrowsing/developers_guide_v2.html#Canonicalization lists some interesting cases we've come across on the anti-phishing team in Google. To the extent you're concerned with / interested in canonicalizaiton, it may be worth taking a look at (not to suggest you follow that in determining how to parse/canonicalize URLs, but rather to make sure that you have some correct way of handling the listed URLs). Thanks. That's helpful. BTW, are you covering canonicalization? Yes. The three main things I'm hoping to cover are parsing, canonicalization, and resolving relative URLs. Is there any place in the Web platform where canonicalize is exposed by itself in a Web-facing way? I think resolve against a base and parse into components are the only algorithms whose effects can be observed directly. I think we only need to spec canonicalize if it turns out to be a useful subroutine. As far as I know, you can only see f(x) = canonicalize(parse(resolve(x))) and also some breakdown components of f(x) in HTMLAnchorElement and window.location.hash (and friends). Conceptually, it's a bit easier to think about them as three separate functions. The main difference between parse and canonicalize is that parse segments the input and canonicalize takes the segments, mutates them, and assembles them into a new string. I haven't studied resolve in as much detail yet, so I'm less clear how that fits into the puzzle. There's also the related question of what browsers should do with input typed into the URL field. Other than establishing that these rules may be different between the URL field and URLs present in content, I'm not sure this is amenable to spec. But perhaps a survey of what browsers do would be useful. I wasn't planning to cover that because it's not a critical to interoperability, at least not in the same way understanding what do do with the href attribute of the a tag is. There are also other considerations there because the URLs are displayed to users as security indicators. Adam
Re: [whatwg] [URL] Starting work on a URL spec
2010/7/25 Boris Zbarsky bzbar...@mit.edu: On 7/25/10 8:57 AM, Adam Barth wrote: There's also the related question of what browsers should do with input typed into the URL field. Other than establishing that these rules may be different between the URL field and URLs present in content, I'm not sure this is amenable to spec. But perhaps a survey of what browsers do would be useful. I wasn't planning to cover that because it's not a critical to interoperability Unfortunately, it is. In particular, servers need to know what to expect the browser to send if a user types non-ASCII into the url bar. There are real interoperability problems out there due to differing server and browser behavior in this regard. It may not be an _html_ interoperability problem, but it's certainly a _web_ interoperability problem. There are also other considerations there because the URLs are displayed to users as security indicators. What's displayed is not a concern, in my opinion, in terms of interoperability. What's put on the wire is. The constraints that need to be imposed are much looser than on a href (e.g. we don't need to define exactly what url gets loaded if the user types monkey in the url bar), but sorting out the non-ASCII issue is definitely desirable. Okiedokes. I'll add that to my list of things to pay attention to. I can't promise I'll get to it in this round though. Thanks, Adam
Re: [whatwg] [URL] Starting work on a URL spec
On Sun, 25 Jul 2010, Adam Barth wrote: As far as I know, you can only see f(x) = canonicalize(parse(resolve(x))) and also some breakdown components of f(x) in HTMLAnchorElement and window.location.hash (and friends). Can you see the result of resolve(x) without seeing its result go through parse() and canonicalize()? If not, then we should just define resolve() as doing the canonicalize() step. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] [URL] Starting work on a URL spec
On Sun, Jul 25, 2010 at 6:05 PM, Ian Hickson i...@hixie.ch wrote: On Sun, 25 Jul 2010, Adam Barth wrote: As far as I know, you can only see f(x) = canonicalize(parse(resolve(x))) and also some breakdown components of f(x) in HTMLAnchorElement and window.location.hash (and friends). Can you see the result of resolve(x) without seeing its result go through parse() and canonicalize()? I don't know of any way to do that. I can tell you that in WebKit, the function that usually gets called to resolve URLs (called completeURL if you want to look it up in the source) returns a canonicalized URL. If not, then we should just define resolve() as doing the canonicalize() step. Yeah, what might make the most sense is to use canonicalize to post-process both resolving and parsing. We can choose the names so that calling the canonicalizing version is easy. Adam
Re: [whatwg] [URL] Starting work on a URL spec
On 7/25/10 3:05 PM, Adam Barth wrote: I don't know of any way to do that. I can tell you that in WebKit, the function that usually gets called to resolve URLs (called completeURL if you want to look it up in the source) returns a canonicalized URL. The same is true in Gecko. The way an nsIURI object is typically constructed is from an nsIURI base (possibly null) and a URL string, and the return value is resolved and canonicalized. -Boris
Re: [whatwg] [URL] Starting work on a URL spec
On 7/24/10 1:50 AM, Brett Zamir wrote: I would be particularly interested in data on this last, across different browsers, operating systems, and locales... There seem to be servers out there expecting their URIs in UTF-8 and others expecting them in ISO-8859-1, and it's not clear to me how to make things work with them all. Seems to me that if they are not in UTF-8, they should be treated as bugs, even if that is not a de jure standard. Treated as bugs by whom? The scenario is that a user types some non-ASCII text in the url bar. This needs to be url-encoded to actually go on the wire, which raises the question of what encoding. If the user is using IRIs, the answer is UTF-8. A number of servers barf if you do this, especially because some server-side scripting languages (PHP, e.g., last I checked) default to URI-unescaping via something other than UTF-8. So some browser encode the non-query part of the URI as UTF-8 and the query part as ... something (user's default filesystem encoding, say, for lack of a better guess). Others always use UTF-8 (and end up with some servers not usable). Others... I have no idea. That's why I want data. ;) In particular, while the just use UTF-8, and if the user can't access the site sucks to be the user approach has a certain theoretical-purity appeal, it doesn't seem like something I want to do to my friends and family (always a good criterion for things you'd like to do to users). -Boris
Re: [whatwg] [URL] Starting work on a URL spec
On 7/24/10 2:49 AM, Brett Zamir wrote: By the servers/scripting languages. While it is great that the browsers are involved in the process, I think it would be reasonable to invite the other stake-holders to join the discussions. If they're willing to talk to us, great. My past experience talking to server developers has been ... suboptimal enough that now I just route around the damage instead, by default. You may be right that in this case that's not a good idea. Hopefully to be fixed in PHP6 with its promise of full Unicode support... Though per http://www.slideshare.net/kfish/unicode-php6-presentation : Right. Not holding my breath yet. ;) What I meant is to try to get the server systems on board to fix the issue, including in the long-term. I appreciate you all being admirably practical champions of present-day compatibility, though I'd hope there is a vision to make things work better for the future Yep. That vision is always use UTF-8; there are just coordination problems getting there -Boris
Re: [whatwg] [URL] Starting work on a URL spec
2010/7/23 Ian Fette (イアンフェッティ) ife...@google.com: http://code.google.com/apis/safebrowsing/developers_guide_v2.html#Canonicalization lists some interesting cases we've come across on the anti-phishing team in Google. To the extent you're concerned with / interested in canonicalizaiton, it may be worth taking a look at (not to suggest you follow that in determining how to parse/canonicalize URLs, but rather to make sure that you have some correct way of handling the listed URLs). Thanks. That's helpful. BTW, are you covering canonicalization? Yes. The three main things I'm hoping to cover are parsing, canonicalization, and resolving relative URLs. Adam On Fri, Jul 23, 2010 at 9:02 PM, Boris Zbarsky bzbar...@mit.edu wrote: On 7/23/10 11:59 PM, Silvia Pfeiffer wrote: Is that URLs as values of attributes in HTML or is that URLs as pasted into the address bar? I believe their processing differs... It certainly does in Firefox (the latter have a lot more fixup done to them, and there are also differences in terms of how character encodings are handled). I would be particularly interested in data on this last, across different browsers, operating systems, and locales... There seem to be servers out there expecting their URIs in UTF-8 and others expecting them in ISO-8859-1, and it's not clear to me how to make things work with them all. -Boris
Re: [whatwg] [URL] Starting work on a URL spec
On Fri, Jul 23, 2010 at 8:59 PM, Silvia Pfeiffer silviapfeiff...@gmail.comwrote: Is that URLs as values of attributes in HTML or is that URLs as pasted into the address bar? I believe their processing differs... I strongly suggest ignoring browser address bars. As the author of most of the Chromium omnibox code, I can testify that there's a ton of fixup, heuristics, and other stuff that's designed to get the user what they want that should never be in a spec. I think limiting the scope to URLs consumed as part of web content makes more sense. PK
Re: [whatwg] [URL] Starting work on a URL spec
On Jul 24, 2010, at 9:55 AM, Adam Barth wrote: 2010/7/23 Ian Fette (イアンフェッティ) ife...@google.com: http://code.google.com/apis/safebrowsing/developers_guide_v2.html#Canonicalization lists some interesting cases we've come across on the anti-phishing team in Google. To the extent you're concerned with / interested in canonicalizaiton, it may be worth taking a look at (not to suggest you follow that in determining how to parse/canonicalize URLs, but rather to make sure that you have some correct way of handling the listed URLs). Thanks. That's helpful. BTW, are you covering canonicalization? Yes. The three main things I'm hoping to cover are parsing, canonicalization, and resolving relative URLs. Is there any place in the Web platform where canonicalize is exposed by itself in a Web-facing way? I think resolve against a base and parse into components are the only algorithms whose effects can be observed directly. I think we only need to spec canonicalize if it turns out to be a useful subroutine. There's also the related question of what browsers should do with input typed into the URL field. Other than establishing that these rules may be different between the URL field and URLs present in content, I'm not sure this is amenable to spec. But perhaps a survey of what browsers do would be useful. Regards, Maciej
Re: [whatwg] [URL] Starting work on a URL spec
On 7/23/10 3:11 PM, Adam Barth wrote: Please let me know if you know of any public URL parsing test suites. My main starting point will be the WebKit URL parsing test suite, There's a bit at http://mxr.mozilla.org/mozilla-central/source/netwerk/test/unit/test_standardurl.js I thought there was some other stuff there too, but can't find it at the moment. This only tests authority urls. -Boris
Re: [whatwg] [URL] Starting work on a URL spec
On Fri, 23 Jul 2010 21:11:35 +0200, Adam Barth w...@adambarth.com wrote: I've begun working on a specification for how browsers process URLs: http://github.com/abarth/url-spec The repository is currently empty, but I'll be adding the basic skeleton over the next few weeks. My intention is to triangulate between how IE, Firefox, Chrome, Safari, and Opera process URLs to find an algorithm that is both compatible with the web and moderately sane. Good luck ;) Seriously, it is probably worth looking at things like curl that are not browsers but consume URLs (and in turn are used by various systems that interact with URLs for things like software updates, synchronisation, etc). cheers Chaals -- Charles McCathieNevile Opera Software, Standards Group je parle français -- hablo español -- jeg lærer norsk http://my.opera.com/chaals Try Opera: http://www.opera.com
Re: [whatwg] [URL] Starting work on a URL spec
Is that URLs as values of attributes in HTML or is that URLs as pasted into the address bar? I believe their processing differs... Good luck with it, anyway. I'm sure you've seen http://esw.w3.org/UriTesting . Cheers, Silvia. On Sat, Jul 24, 2010 at 5:11 AM, Adam Barth w...@adambarth.com wrote: I've begun working on a specification for how browsers process URLs: http://github.com/abarth/url-spec The repository is currently empty, but I'll be adding the basic skeleton over the next few weeks. My intention is to triangulate between how IE, Firefox, Chrome, Safari, and Opera process URLs to find an algorithm that is both compatible with the web and moderately sane. Please let me know if you know of any public URL parsing test suites. My main starting point will be the WebKit URL parsing test suite, http://trac.webkit.org/browser/trunk/LayoutTests/fast/url which was adapted from the GURL parsing library. Thanks, Adam
Re: [whatwg] [URL] Starting work on a URL spec
On 7/23/10 11:59 PM, Silvia Pfeiffer wrote: Is that URLs as values of attributes in HTML or is that URLs as pasted into the address bar? I believe their processing differs... It certainly does in Firefox (the latter have a lot more fixup done to them, and there are also differences in terms of how character encodings are handled). I would be particularly interested in data on this last, across different browsers, operating systems, and locales... There seem to be servers out there expecting their URIs in UTF-8 and others expecting them in ISO-8859-1, and it's not clear to me how to make things work with them all. -Boris
Re: [whatwg] [URL] Starting work on a URL spec
http://code.google.com/apis/safebrowsing/developers_guide_v2.html#Canonicalization lists some interesting cases we've come across on the anti-phishing team in Google. To the extent you're concerned with / interested in canonicalizaiton, it may be worth taking a look at (not to suggest you follow that in determining how to parse/canonicalize URLs, but rather to make sure that you have some correct way of handling the listed URLs). BTW, are you covering canonicalization? -Ian On Fri, Jul 23, 2010 at 9:02 PM, Boris Zbarsky bzbar...@mit.edu wrote: On 7/23/10 11:59 PM, Silvia Pfeiffer wrote: Is that URLs as values of attributes in HTML or is that URLs as pasted into the address bar? I believe their processing differs... It certainly does in Firefox (the latter have a lot more fixup done to them, and there are also differences in terms of how character encodings are handled). I would be particularly interested in data on this last, across different browsers, operating systems, and locales... There seem to be servers out there expecting their URIs in UTF-8 and others expecting them in ISO-8859-1, and it's not clear to me how to make things work with them all. -Boris
Re: [whatwg] [URL] Starting work on a URL spec
On 7/24/2010 12:02 PM, Boris Zbarsky wrote: On 7/23/10 11:59 PM, Silvia Pfeiffer wrote: Is that URLs as values of attributes in HTML or is that URLs as pasted into the address bar? I believe their processing differs... It certainly does in Firefox (the latter have a lot more fixup done to them, and there are also differences in terms of how character encodings are handled). I would be particularly interested in data on this last, across different browsers, operating systems, and locales... There seem to be servers out there expecting their URIs in UTF-8 and others expecting them in ISO-8859-1, and it's not clear to me how to make things work with them all. Seems to me that if they are not in UTF-8, they should be treated as bugs, even if that is not a de jure standard. Brett