Re: Bug in relative URL handling

Kalin KOZHUHAROV Thu, 23 Jan 2003 23:14:30 -0800

I just realized, I didn't send this and some other post to the list, but directly to the replier...

Gary Hargrave wrote:

wget does not seem to handle relative links in web pages
of the form

http:page3.html


According to my understanding of rfc1808 this is a valid
URL. When recursively retrieving html pages wget ignores
these links with out displaying an error or warning.

Well, I am sure it is wrong URL, but took some time till I pinpoint it
in  RFC1808. Otherwise it would be very difficult to code URL parser.
What you actually try to convince us is that you can omit the
net-location (i.e. usually comes in the middle) and still be able to
tell the location. Then how do you interpret http:program.com ?
Is it a site program in TLD com, or a .com (DOS executable) file served
who knows why via http?

So one of the places this is discussed in RFC1808 is:

4.  Resolving Relative URLs
...

Step 2b): If the embedded URL starts with a scheme name, it is
     interpreted as an *absolute* URL and we are done.

BTW, did you try to click in your browser on that link?

Kalin.

--
||///_ o  *****************************
||//,_/>     WWW: http://ThinRope.net/
|||\ <"   mobile: +81 (90) 6265-0856
|||\\ ' NetPager: [EMAIL PROTECTED]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Re: Bug in relative URL handling

Reply via email to