Re: [whatwg] Web Addresses vs Legacy Extended IRI (again)
On Sun, 29 Mar 2009, Giovanni Campagna wrote: (In this email I will use URL5 as a short for Web Addresses, as that previously was the URL part of HTML5) This section is to be extracted from HTML5 shortly. I've forwarded your e-mails to DanC, the editor of the Web Addresses spec. Cheers, -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Web Addresses vs Legacy Extended IRI
On Sun, 22 Mar 2009, Giovanni Campagna wrote: As far as I can tell the LEIRI requirements aren't actually an accurate description of what browsers do. My question was more specific: what are the *techical differences* betwen LEIRI and Web Addresses? I don't think there's a complete documentation of this anywhere. Can't we have one technology instead of two? Web addresses and LEIRIs are both maintained by the W3C and the IETF now, so I recommend discussing this with the editors of the relevant specs. The relevant section is going to be removed from HTML5 as soon as practical. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Web Addresses vs Legacy Extended IRI
On Mon, 23 Mar 2009, Julian Reschke wrote: Ian Hickson wrote: Note that the Web addresses draft isn't specific to HTML5. It is intended to apply to any user agent that interacts with Web content, not just Web browsers and HTML. (That's why we took it out of HTML5.) Be careful; depending on what you call Web content. For instance, I would consider the Atom feed content (RFC4287) as Web content, but Atom really uses IRIs, and doesn't need workarounds for broken IRIs in content (as far as I can tell). There are implementations of Atom that treat the URLs therein just like those in HTML content. I haven't studied existing content to see if this is required for compatibility, though. I wouldn't be surprised if it was, since much Atom content is just generated based on content that is primarily intended for HTML generation. Don't leak out workarounds into areas where they aren't needed. I'd much rather we just had one set of interpretations of URLs, defined in one place, than the four or more we have now. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Web Addresses vs Legacy Extended IRI (again)
2009/3/29 Kristof Zelechovski giecr...@stegny.2a.pl: It is not clear that the server will be able to correctly support various representations of characters in the path component, e.g. identify accented characters with their decompositions using combining diacritical marks. The peculiarities can depend on the underlying file system conventions. Therefore, if all representations are considered equally appropriate, various resources may suddenly become unavailable, depending on the encoding decisions taken by the user agent. Chris It is not clear to me that the server will be able to support the composed form of à or ø. Where is specified the conversion from ISO-8859-1 to UCS? Nowhere. If a server knows it cannot deal with Unicode Normalization, it should either use an encoding form of Unicode (utf-8, utf-16), implement a technology that uses directly IRIs (because Normalization is introduced only when converting to an URI) or generate IRIs with binary path data in opaque form (ie percent-encoded) By the way, the server should be able to deal with both composed and decomposed forms of accented character (or use none of them), because I may type the path directly in my address bar (do you know what IME I use?) Giovanni
Re: [whatwg] Web Addresses vs Legacy Extended IRI (again)
2009/3/29 Anne van Kesteren ann...@opera.com: On Sun, 29 Mar 2009 14:37:19 +0200, Giovanni Campagna scampa.giova...@gmail.com wrote: Summing up, the differences between URL5 and LEIRI are only about the percent sign and its uses for delimiters. I'm not sure if you're correct about those differences, but even if you are they are not the only differences. E.g. LEIRIs perform normalization if the input encoding is non-Unicode. URLs do not. URLs can encode their query component per the input encoding (and do so for HTML and some APIs). LEIRIs cannot. What is the problem with normalization? Is there a standard for conversion to non-Unicode to Unicode? I guess no, so normalization (which should always be done) is perfectly legal. In addition, IRIs are defined as a sequence of Unicode codepoints. It does not matter how those codepoints are stored (ASCII, ISO-8859-1, UTF-8), only the Unicode version of them. This is the same as URL5s, by the way, because none of them is defined on octets and both use the RFC3986 method for percent-encoding (using UTF-8) (Also, I'm not sure if the WHATWG list is the right place to discuss this as the editor of the new draft might not read this list at all.) Unfortunately, I cannot join the public-html list. I could cross-post this to www-html or www-archive but it would break the archives and make it difficult to follow. -- Anne van Kesteren http://annevankesteren.nl/ Giovanni
Re: [whatwg] Web Addresses vs Legacy Extended IRI (again)
On Sun, 29 Mar 2009 15:01:51 +0200, Giovanni Campagna scampa.giova...@gmail.com wrote: 2009/3/29 Anne van Kesteren ann...@opera.com: I'm not sure if you're correct about those differences, but even if you are they are not the only differences. E.g. LEIRIs perform normalization if the input encoding is non-Unicode. URLs do not. URLs can encode their query component per the input encoding (and do so for HTML and some APIs). LEIRIs cannot. What is the problem with normalization? Is there a standard for conversion to non-Unicode to Unicode? I guess no, so normalization (which should always be done) is perfectly legal. It's about Unicode Normalization. (And it should not always be done.) In addition, IRIs are defined as a sequence of Unicode codepoints. It does not matter how those codepoints are stored (ASCII, ISO-8859-1, UTF-8), only the Unicode version of them. Please read the IRI specification again. Specifically section 3.1. This is the same as URL5s, by the way, because none of them is defined on octets and both use the RFC3986 method for percent-encoding (using UTF-8) No, it's not always using UTF-8. -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] Web Addresses vs Legacy Extended IRI (again)
2009/3/29 Anne van Kesteren ann...@opera.com: On Sun, 29 Mar 2009 15:01:51 +0200, Giovanni Campagna scampa.giova...@gmail.com wrote: 2009/3/29 Anne van Kesteren ann...@opera.com: I'm not sure if you're correct about those differences, but even if you are they are not the only differences. E.g. LEIRIs perform normalization if the input encoding is non-Unicode. URLs do not. URLs can encode their query component per the input encoding (and do so for HTML and some APIs). LEIRIs cannot. What is the problem with normalization? Is there a standard for conversion to non-Unicode to Unicode? I guess no, so normalization (which should always be done) is perfectly legal. It's about Unicode Normalization. (And it should not always be done.) If I convert from ISO-8859-1 and find À (decimal 192), I can emit À U+00C0 LATIN CAPITAL A WITH GRAVE or A U+0041 LATIN CAPITAL LETTER A followed by ̀ U+0300 COMBINING GRAVE ACCENT One is NFC, the other is NFD, and both are legal and simple. In addition, IRIs are defined as a sequence of Unicode codepoints. It does not matter how those codepoints are stored (ASCII, ISO-8859-1, UTF-8), only the Unicode version of them. Please read the IRI specification again. Specifically section 3.1. Specification says that IRIs must be a in normalized UCS when created from user input, else it must be converted to Unicode if not already (and the conversion should be normalizing), else it must be converted from UTF-8 / 16 / 32 to UCS but not normalized. I don't see a particular problem in this. This is the same as URL5s, by the way, because none of them is defined on octets and both use the RFC3986 method for percent-encoding (using UTF-8) No, it's not always using UTF-8. RFC3986 never creates percent encoding (percent-encoding is used for unspecified binary data) but says that text components should be encoded as UTF-8 and that rules are estabilished by scheme specific syntaxes. -- Anne van Kesteren http://annevankesteren.nl/ Giovanni
Re: [whatwg] Web Addresses vs Legacy Extended IRI (again)
It is not clear that the server will be able to correctly support various representations of characters in the path component, e.g. identify accented characters with their decompositions using combining diacritical marks. The peculiarities can depend on the underlying file system conventions. Therefore, if all representations are considered equally appropriate, various resources may suddenly become unavailable, depending on the encoding decisions taken by the user agent. Chris
Re: [whatwg] Web Addresses vs Legacy Extended IRI
Ian Hickson wrote: ... Note that the Web addresses draft isn't specific to HTML5. It is intended to apply to any user agent that interacts with Web content, not just Web browsers and HTML. (That's why we took it out of HTML5.) ... Be careful; depending on what you call Web content. For instance, I would consider the Atom feed content (RFC4287) as Web content, but Atom really uses IRIs, and doesn't need workarounds for broken IRIs in content (as far as I can tell). Don't leak out workarounds into areas where they aren't needed. BR, Julian
Re: [whatwg] Web Addresses vs Legacy Extended IRI
On Mon, 23 Mar 2009 09:45:39 +0100, Julian Reschke julian.resc...@gmx.de wrote: Ian Hickson wrote: ... Note that the Web addresses draft isn't specific to HTML5. It is intended to apply to any user agent that interacts with Web content, not just Web browsers and HTML. (That's why we took it out of HTML5.) ... Be careful; depending on what you call Web content. For instance, I would consider the Atom feed content (RFC4287) as Web content, but Atom really uses IRIs, and doesn't need workarounds for broken IRIs in content (as far as I can tell). Are you sure browser implementations of feeds reject non-IRIs in some way? I would expect them to use the same URL handling everywhere. Don't leak out workarounds into areas where they aren't needed. I'm not convinced that having two ways of handling essentially the same thing is good. -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] Web Addresses vs Legacy Extended IRI
Anne van Kesteren wrote: Be careful; depending on what you call Web content. For instance, I would consider the Atom feed content (RFC4287) as Web content, but Atom really uses IRIs, and doesn't need workarounds for broken IRIs in content (as far as I can tell). Are you sure browser implementations of feeds reject non-IRIs in some way? I would expect them to use the same URL handling everywhere. I wasn't talking of browser implementations of feeds, but feed readers in general. Don't leak out workarounds into areas where they aren't needed. I'm not convinced that having two ways of handling essentially the same thing is good. It's unavoidable, as the relaxed syntax doesn't work in many cases, for instance, when whitespace acts as a delimiter. BR, Julian
Re: [whatwg] Web Addresses vs Legacy Extended IRI
On Mon, 23 Mar 2009 11:25:19 +0100, Julian Reschke julian.resc...@gmx.de wrote: Anne van Kesteren wrote: Be careful; depending on what you call Web content. For instance, I would consider the Atom feed content (RFC4287) as Web content, but Atom really uses IRIs, and doesn't need workarounds for broken IRIs in content (as far as I can tell). Are you sure browser implementations of feeds reject non-IRIs in some way? I would expect them to use the same URL handling everywhere. I wasn't talking of browser implementations of feeds, but feed readers in general. Well yes, and a subset of those is browser based. Besides that, most feed readers handle HTML. Do you think they should have two separate URL parsing functions? Don't leak out workarounds into areas where they aren't needed. I'm not convinced that having two ways of handling essentially the same thing is good. It's unavoidable, as the relaxed syntax doesn't work in many cases, for instance, when whitespace acts as a delimiter. Obviously you would first split on whitepace and then parse the URLs. You can still use the same generic URL handling. -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] Web Addresses vs Legacy Extended IRI
Anne van Kesteren wrote: I wasn't talking of browser implementations of feeds, but feed readers in general. Well yes, and a subset of those is browser based. Besides that, most feed readers handle HTML. Do you think they should have two separate URL parsing functions? Yes, absolutely. Don't leak out workarounds into areas where they aren't needed. I'm not convinced that having two ways of handling essentially the same thing is good. It's unavoidable, as the relaxed syntax doesn't work in many cases, for instance, when whitespace acts as a delimiter. Obviously you would first split on whitepace and then parse the URLs. You can still use the same generic URL handling. In which case IRI handling should be totally sufficient. Best regards, Julian
Re: [whatwg] Web Addresses vs Legacy Extended IRI
On Mon, 23 Mar 2009 11:31:01 +0100, Julian Reschke julian.resc...@gmx.de wrote: Anne van Kesteren wrote: Well yes, and a subset of those is browser based. Besides that, most feed readers handle HTML. Do you think they should have two separate URL parsing functions? Yes, absolutely. Why? I'm not convinced that having two ways of handling essentially the same thing is good. It's unavoidable, as the relaxed syntax doesn't work in many cases, for instance, when whitespace acts as a delimiter. Obviously you would first split on whitepace and then parse the URLs. You can still use the same generic URL handling. In which case IRI handling should be totally sufficient. I don't follow. I said I'm not convinced that having two ways of handling essentially the same thing is good. Then you said It's unavoidable. Then I pointed out it is avoidable. And then you say this. It doesn't add up. -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] Web Addresses vs Legacy Extended IRI
Anne van Kesteren wrote: On Mon, 23 Mar 2009 11:31:01 +0100, Julian Reschke julian.resc...@gmx.de wrote: Anne van Kesteren wrote: Well yes, and a subset of those is browser based. Besides that, most feed readers handle HTML. Do you think they should have two separate URL parsing functions? Yes, absolutely. Why? Because it's preferable to the alternative, which is, leaking out the non-conformant URI/IRI handling into other places. Obviously you would first split on whitepace and then parse the URLs. You can still use the same generic URL handling. In which case IRI handling should be totally sufficient. I don't follow. I said I'm not convinced that having two ways of handling essentially the same thing is good. Then you said It's unavoidable. Then I pointed out it is avoidable. And then you say this. It doesn't add up. The issue is that it's *not* the same thing. BR, Julian
Re: [whatwg] Web Addresses vs Legacy Extended IRI
On Mon, 23 Mar 2009 11:46:15 +0100, Julian Reschke julian.resc...@gmx.de wrote: Because it's preferable to the alternative, which is, leaking out the non-conformant URI/IRI handling into other places. Apparently that is already happening in part anyway due to LEIRIs. Modulo the URL encoding bit (which you can set to always being UTF-8 for non-HTML contexts) I'm not sure what's so bad about allowing a few more characters. The issue is that it's *not* the same thing. Well, no, not exactly. But they perform essentially the same task, modulo a few characters. And since one is a superset of the other (as long as URL encoding is UTF-8) I don't see a point in having both. -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] Web Addresses vs Legacy Extended IRI
Anne van Kesteren wrote: On Mon, 23 Mar 2009 11:46:15 +0100, Julian Reschke julian.resc...@gmx.de wrote: Because it's preferable to the alternative, which is, leaking out the non-conformant URI/IRI handling into other places. Apparently that is already happening in part anyway due to LEIRIs. Modulo the URL encoding bit (which you can set to always being UTF-8 for non-HTML contexts) I'm not sure what's so bad about allowing a few more characters. Whitespace is a big issue - auto-highlighting will fail all over the place. The issue is that it's *not* the same thing. Well, no, not exactly. But they perform essentially the same task, modulo a few characters. And since one is a superset of the other (as long as URL encoding is UTF-8) I don't see a point in having both. Well, then let's just agree that we disagree on that. BR, Julian
Re: [whatwg] Web Addresses vs Legacy Extended IRI
On Mon, 23 Mar 2009 11:58:59 +0100, Julian Reschke julian.resc...@gmx.de wrote: Whitespace is a big issue - auto-highlighting will fail all over the place. Auto-higlighting and linking code already fails all over the place due to e.g. punctation issues. A solution for whitespace specifically is to simply forbid it, but still require parsers to handle it as browsers already do for HTML and XMLHttpRequest. Apparently browsers also handle it for HTTP as otherwise e.g. http://www.usafa.af.mil/ would not work which returns a 302 with Location: index.cfm?catname=AFA Homepage. Similarly http://www.flightsimulator.nl/ gives a URL in the Location header that contains a \ which is also illegal but it is handled fine. (Thanks to Philip`) (Whitespace is one of the things LEIRIs introduce by the way.) The issue is that it's *not* the same thing. Well, no, not exactly. But they perform essentially the same task, modulo a few characters. And since one is a superset of the other (as long as URL encoding is UTF-8) I don't see a point in having both. Well, then let's just agree that we disagree on that. I would still be interested in hearing your point. Is it whitespace? -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] Web Addresses vs Legacy Extended IRI
Anne van Kesteren wrote: The issue is that it's *not* the same thing. Well, no, not exactly. But they perform essentially the same task, modulo a few characters. And since one is a superset of the other (as long as URL encoding is UTF-8) I don't see a point in having both. Well, then let's just agree that we disagree on that. I would still be interested in hearing your point. Is it whitespace? ...and other characters that are not allowed in URIs and IRIs, such as { and } (which therefore can be used as delimiters). BR, Julian
Re: [whatwg] Web Addresses vs Legacy Extended IRI
Anne van Kesteren wrote: On Mon, 23 Mar 2009 12:50:46 +0100, Julian Reschke julian.resc...@gmx.de wrote: ...and other characters that are not allowed in URIs and IRIs, such as { and } (which therefore can be used as delimiters). And keeping them invalid but requiring user agents to handle those characters as part of a URL (after it has been determined what the URL is for a given context) does not work because? You are essentially proposing to change existing specifications (such as Atom). I just do not see the point. If you think it's worthwhile, propose that change to the relevant standards body (in this case IETF Applications Area). BR, Julian
Re: [whatwg] Web Addresses vs Legacy Extended IRI
On Mon, 23 Mar 2009 12:50:46 +0100, Julian Reschke julian.resc...@gmx.de wrote: ...and other characters that are not allowed in URIs and IRIs, such as { and } (which therefore can be used as delimiters). And keeping them invalid but requiring user agents to handle those characters as part of a URL (after it has been determined what the URL is for a given context) does not work because? -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] Web Addresses vs Legacy Extended IRI
On Mon, 23 Mar 2009, Julian Reschke wrote: You are essentially proposing to change existing specifications (such as Atom). I just do not see the point. The point is to ensure there is only one way to handle strings that are purported to be IRIs but that are invalid. Right now, there are at least three different ways to do it: the way that the URI/IRI specs say, the way that the LEIRI docs say, and the way that legacy HTML content relies on. My understanding is that even command line software, feed readers, and other non-Web browser tools agree that the specs are wrong here. For example, curl will not refuse to fetch the URL http://example.com/% despite that URL being invalid. Thus, we need a spec they are willing to follow. The idea of not limiting it to HTML is to prevent tools that deal both with HTML and with other languages (like Atom, CSS, DOM APIs, etc) from having to have two different implementations if they want to be conforming. If you think it's worthwhile, propose that change to the relevant standards body (in this case IETF Applications Area). This was the first thing we tried, but the people on the URI lists were not interested in making their specs useful for the real world. We are now routing around that negative energy. We're having a meeting later this week to see if the IETF will adopt the spec anyway, though. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Web Addresses vs Legacy Extended IRI
Ian Hickson wrote: On Mon, 23 Mar 2009, Julian Reschke wrote: You are essentially proposing to change existing specifications (such as Atom). I just do not see the point. The point is to ensure there is only one way to handle strings that are purported to be IRIs but that are invalid. Right now, there are at least three different ways to do it: the way that the URI/IRI specs say, the way that the LEIRI docs say, and the way that legacy HTML content relies on. My understanding is that even command line software, feed readers, and other non-Web browser tools agree that the specs are wrong here. For example, curl will not refuse to fetch the URL http://example.com/% despite that URL being invalid. Should it refuse to? Thus, we need a spec they are willing to follow. The idea of not limiting it to HTML is to prevent tools that deal both with HTML and with other languages (like Atom, CSS, DOM APIs, etc) from having to have two different implementations if they want to be conforming. I understand that you want everybody to use the same rules, and you want these rules to be the ones needed for HTML content. I disagree with that. Do not leak that stuff into places where it's not needed. For instance, there are lots of cases where the Atom feed format can be used in absence of HTML. ... If you think it's worthwhile, propose that change to the relevant standards body (in this case IETF Applications Area). This was the first thing we tried, but the people on the URI lists were not interested in making their specs useful for the real world. We are now routing around that negative energy. We're having a meeting later this week to see if the IETF will adopt the spec anyway, though. Adopting the spec is not the same thing as mandating its use all over the place. BR, Julian
Re: [whatwg] Web Addresses vs Legacy Extended IRI
[cc'ed DanC since I don't think Dan is on the WHATWG list, and he's the editor of the draft at this point] On Mon, 23 Mar 2009, Julian Reschke wrote: For example, curl will not refuse to fetch the URL http://example.com/% despite that URL being invalid. Should it refuse to? The URI/IRI specs don't say, because they don't cover error handling. This is what the Web addresses spec is supposed to cover. It doesn't change the rules for anything that the URI spec defines, it just also says how to handle errors. That way, we can have interoperability across all inputs. I personally don't care if we say that http://example.com/% should be thrown out or accepted. However, I _do_ care that we get something that is widely and uniformly implemented, and the best way to do that is to write a spec that matches what people have already implemented. Thus, we need a spec they are willing to follow. The idea of not limiting it to HTML is to prevent tools that deal both with HTML and with other languages (like Atom, CSS, DOM APIs, etc) from having to have two different implementations if they want to be conforming. I understand that you want everybody to use the same rules, and you want these rules to be the ones needed for HTML content. I disagree with that. I want everyone to follow the same rules. I don't care what those rules are, so long as everyone (or at least, the vast majority of systems) are willing to follow them. Right now, it seems to me that most systems do the same thing, so it makes sense to follow what they do. This really has nothing to do with HTML. Do not leak that stuff into places where it's not needed. Interoperability and uniformity in implementations is important everywhere. If there are areas that are self-contained and never interact with the rest of the Internet, then they can do whatever they like. I do not believe I have ever suggested doing anything to such software. However, 'curl' obviously isn't self-contained; people will take URLs from e-mails and paste them into the command line to fetch files from FTP servers, and we should ensure that this works the same way whether the user is using Pine with wget or Mail.app with curl or any other combination of mail client and download tool. For instance, there are lots of cases where the Atom feed format can be used in absence of HTML. Sure, but the tools that use Atom still need to process URLs in the same way as other tools. It would be very bad if a site had an RSS feed and an Atom feed and they both said that the item's URL was http://example.com/% but in one feed that resulted in one file being fetched but in another it resulted in another file being fetched. If you think it's worthwhile, propose that change to the relevant standards body (in this case IETF Applications Area). This was the first thing we tried, but the people on the URI lists were not interested in making their specs useful for the real world. We are now routing around that negative energy. We're having a meeting later this week to see if the IETF will adopt the spec anyway, though. Adopting the spec is not the same thing as mandating its use all over the place. I think it is important that we have interoperable use of URLs in the transitive closure of places that use URLs, starting from any common starting point, like the URL in an e-mail example above. I believe this includes most if not all Internet software. I also believe that in practice most software is already doing this, though often in subtly different ways since the URI and IRI specs did not define error handling. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Web Addresses vs Legacy Extended IRI
Ian Hickson wrote: [cc'ed DanC since I don't think Dan is on the WHATWG list, and he's the editor of the draft at this point] On Mon, 23 Mar 2009, Julian Reschke wrote: For example, curl will not refuse to fetch the URL http://example.com/% despite that URL being invalid. Should it refuse to? The URI/IRI specs don't say, because they don't cover error handling. Indeed. This is what the Web addresses spec is supposed to cover. It doesn't change the rules for anything that the URI spec defines, it just also says how to handle errors. That way, we can have interoperability across all inputs. I personally don't care if we say that http://example.com/% should be thrown out or accepted. However, I _do_ care that we get something that is widely and uniformly implemented, and the best way to do that is to write a spec that matches what people have already implemented. I'm OK with doing that for browsers. I'm *very* skeptical about the idea that it needs to be the same way everywhere else. Thus, we need a spec they are willing to follow. The idea of not limiting it to HTML is to prevent tools that deal both with HTML and with other languages (like Atom, CSS, DOM APIs, etc) from having to have two different implementations if they want to be conforming. I understand that you want everybody to use the same rules, and you want these rules to be the ones needed for HTML content. I disagree with that. I want everyone to follow the same rules. I don't care what those rules are, so long as everyone (or at least, the vast majority of systems) are willing to follow them. Right now, it seems to me that most systems do the same thing, so it makes sense to follow what they do. This really has nothing to do with HTML. Your perspective on most systems differs from mine. Do not leak that stuff into places where it's not needed. Interoperability and uniformity in implementations is important everywhere. If there are areas that are self-contained and never interact with the rest of the Internet, then they can do whatever they like. I do not believe I have ever suggested doing anything to such software. However, 'curl' obviously isn't self-contained; people will take URLs from e-mails and paste them into the command line to fetch files from FTP servers, and we should ensure that this works the same way whether the user is using Pine with wget or Mail.app with curl or any other combination of mail client and download tool. How many people paste URLs into command lines? And of these, how many remember that they likely need to quote them? For instance, there are lots of cases where the Atom feed format can be used in absence of HTML. Sure, but the tools that use Atom still need to process URLs in the same way as other tools. It would be very bad if a site had an RSS feed and an Atom feed and they both said that the item's URL was http://example.com/% but in one feed that resulted in one file being fetched but in another it resulted in another file being fetched. Yes, that would be bad. However, what seems to be more likely is that one tool refuses to fetch the file (because the URI parser didn't like it), while in the other case, the tool puts the invalid URL on to the wire, in which case the server's behavior decides. I think this is totally ok, and the more tools reject the URL early, the better. If you think it's worthwhile, propose that change to the relevant standards body (in this case IETF Applications Area). This was the first thing we tried, but the people on the URI lists were not interested in making their specs useful for the real world. We are now routing around that negative energy. We're having a meeting later this week to see if the IETF will adopt the spec anyway, though. Adopting the spec is not the same thing as mandating its use all over the place. I think it is important that we have interoperable use of URLs in the transitive closure of places that use URLs, starting from any common starting point, like the URL in an e-mail example above. I believe this includes most if not all Internet software. I also believe that in practice most software is already doing this, though often in subtly different ways since the URI and IRI specs did not define error handling. If the consequence of this is that invalid URLs do not interoperate, then I think this is a *feature*, not a bug. Best regards, Julian
Re: [whatwg] Web Addresses vs Legacy Extended IRI
On Mon, 23 Mar 2009, Julian Reschke wrote: However, what seems to be more likely is that one tool refuses to fetch the file (because the URI parser didn't like it), while in the other case, the tool puts the invalid URL on to the wire IMHO this is basically the definition of a standards failure. I think this is totally ok I think considering this behaviour to be ok is basically ignoring 19 years of experience with the Web which has shown repeatedly and at huge cost that having different tools act differently in the same situation is a bad idea and only causes end users to have a bad experience. If the consequence of this is that invalid URLs do not interoperate, then I think this is a *feature*, not a bug. I fundamentally disagree. Users don't care what the source of a lack of interoperability is. Whether it's an engineering error or a flaw in the standard or a flaw in the content is irrelevant, the result is the same: an unhappy user. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Web Addresses vs Legacy Extended IRI
On Mar 23, 2009, at 2:25 PM, Ian Hickson wrote: On Mon, 23 Mar 2009, Julian Reschke wrote: However, what seems to be more likely is that one tool refuses to fetch the file (because the URI parser didn't like it), while in the other case, the tool puts the invalid URL on to the wire IMHO this is basically the definition of a standards failure. I think this is totally ok I think considering this behaviour to be ok is basically ignoring 19 years of experience with the Web which has shown repeatedly and at huge cost that having different tools act differently in the same situation is a bad idea and only causes end users to have a bad experience. If the consequence of this is that invalid URLs do not interoperate, then I think this is a *feature*, not a bug. I fundamentally disagree. Users don't care what the source of a lack of interoperability is. Whether it's an engineering error or a flaw in the standard or a flaw in the content is irrelevant, the result is the same: an unhappy user. I largely agree with Ian's perspective on this. The primary purpose of standards is to enable interoperability, therefore failure to interoperate is by definition a standards failure (either in the design of the standard or in correct implementation of the standard). Regards, Maciej
Re: [whatwg] Web Addresses vs Legacy Extended IRI
On Sat, 21 Mar 2009, Giovanni Campagna wrote: Now I would like to ask: are there any major differences that requires the W3C / WHATWG to publish an other specification, just for HTML5, instead of just referencing the IRI-bis draft or the LEIRI working group note? As far as I can tell the LEIRI requirements aren't actually an accurate description of what browsers do. Note that the Web addresses draft isn't specific to HTML5. It is intended to apply to any user agent that interacts with Web content, not just Web browsers and HTML. (That's why we took it out of HTML5.) -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Web Addresses vs Legacy Extended IRI
2009/3/22 Ian Hickson i...@hixie.ch: On Sat, 21 Mar 2009, Giovanni Campagna wrote: Now I would like to ask: are there any major differences that requires the W3C / WHATWG to publish an other specification, just for HTML5, instead of just referencing the IRI-bis draft or the LEIRI working group note? As far as I can tell the LEIRI requirements aren't actually an accurate description of what browsers do. My question was more specific: what are the *techical differences* betwen LEIRI and Web Addresses? Can't we have one technology instead of two? Note that the Web addresses draft isn't specific to HTML5. It is intended to apply to any user agent that interacts with Web content, not just Web browsers and HTML. (That's why we took it out of HTML5.) Unfortunately, languages outside HTML5 (notably XLink, XML Base, SVG, XForms), that use W3C Schema definition and anyURI type, use exactly LEIRI. Other technologies instead use pure URI / IRI (XMLNS, RDF) and I wouldn't see much benefit in relaxing their syntax (because they never actually process their identifiers). -- Ian Hickson U+1047E )\._.,--,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' Giovanni