On Oct 16, 2012, at 2:09 PM, Anne van Kesteren wrote: > On Tue, Oct 16, 2012 at 1:44 PM, Jan Algermissen > <jan.algermis...@nordsc.com> wrote: >> On Oct 16, 2012, at 1:29 PM, Anne van Kesteren wrote: >>> I'm not arguing URLs should be allowed to contain SP, just that they >>> can (and do) in certain contexts and that we need to deal with that >>> (either by terminating processing or converting it to %20 or ignoring >>> it in case of domain names, if I remember correctly). >> >> I am not understanding your perceived problem with two specs. > > I think your context quoting went wrong. > > >> In addition to that you can standardize 'recovery' algorithms for turning >> broken URIs to valid ones. Maybe with different 'heuristics levels' before >> giving up and reporting an error. > > The algorithm is not for "fixing up". It's for processing URLs, > including those that happen to be invalid. The end result is not > always valid per STD 66.
And there lies the problem. Where is the benefit of producing invalid results as opposed to fixing with best effort? What can you do with a result that is an invalid URI? You cannot hand it to any tool that implements the URI spec. And aything you are ever going to do with a parsed-but-invalid URI is treat it as a valid one using a set of assumptions. Why not simply apply these assumptions in the first place and have a valid URI as a result. Much cleaner because the concerns are clearly separated. > > >> Any piece of software that wishes to be nice on 'URI providers' and process >> broken URIs to some extend can apply that standardized algorith in a fixup >> phase before handing it on to the component that expects a valid URI. > > I do not think it makes sense to have different URL parsers (one with > a "be strict" bit works). How you implement that is a detail. If e.g. an HTML broswer intends to apply the fixing algorith it can surely do that as part of the URI parsing. The important part is that the result is a valid URI. > Just like it does not make sense to have two > different HTML parsers in your software stack. I did not say that. Jan > > > -- > http://annevankesteren.nl/