At http://bugs.python.org/issue754016, there is a discussion wherein if a URL is given in a normal way to urlparse (For e.g. urlparse('www.python.org')), it parses it as a path rather than as the net_loc component as is the comman case with browsers.
urlparse module tries to follow RFC 1808, where it is specified that: <quote_rfc1808> 2.4.3. Parsing the Network Location/Login If the parse string begins with a double-slash "//", then the substring of characters after the double-slash and up to, but not including, the next slash "/" character is the network location/login (<net_loc>) of the URL. </quote_rfc1808> For treating the url as a path, the RFC specifies that after parsing, scheme, net_loc, parameters and query, whatever is left is path. <quote_rfc1808> 2.4.6. Parsing the Path After the above steps, all that is left of the parse string is the URL <path> and the slash "/" that may precede it. </quote_rfc1808> So, when 'www.python.org' is not a scheme, net_loc (as per RFC), parameter or query, it is a path. This case looks absurd for 'www.python.org' but perfect for parsing relative urls like just 'a'. More over this makes sense when we have relative urls with parameters and query, for e.g.'g:h','?x' Now, the question comes as "How do we inform the users that if they want the net_loc of the url, they have to use // in the front". My suggestion is through the "Docs" and "Help" message. There is a discussion and suggestion on raising an Exception for cases when url does not start with '//'. As urlparse module is used for handling both absolute URLs as well as relative URLS, this suggestion IMHO, would break the urlparse handling of all relative urls. For e.g, Cases which are mentioned in the RFC 1808 (Section 5.1 Normal Examples). Another way to resolve this would be to break urlparse into two methods: urlparse.absparse() urlparse.relparse() and let the user decide what he wants. Please provide your suggestions on this. - Is the current method okay? - Do we feel need for absparse and relparse()? Thanks. Senthil -- O.R.Senthil Kumaran http://uthcode.sarovar.org _______________________________________________ Web-SIG mailing list [email protected] Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
