On Sat, 15 Jul 2006 16:36:54 -0700 "Tony Lewis" <[EMAIL PROTECTED]> wrote:
> I don't think that's valid HTML. According to RFC 1866: An HTML user > agent should treat end of line in any of its variations as a word > space in all contexts except preformatted text. > > I don't see any provision for end of line within the HREF attribute > of an A tag. > > Tony > _____ > > From: HUAZHANG GUO [mailto:[EMAIL PROTECTED] > Sent: Tuesday, July 11, 2006 7:48 AM > To: [EMAIL PROTECTED] > Subject: I got one bug on Mac OS X > > > Dear Sir/Madam, > > while I was trying to download using the command: > > wget -k -np -r -l inf -E http://dasher.wustl.edu/bio5476/ > > I got most of the files, but lost some of them. > > I think I know where the problem is: > > if the link is broken into two lines in the index.html: > > <P>Lecture 1 (Jan 17): Exploring Conformational Space for Biomolecules > <A HREF="http://dasher.wustl.edu/bio5476/lectures > /lecture-01.pdf">[PDF]</A></P> > > > I will get the following error message: > > > --09:13:16-- > http://dasher.wustl.edu/bio5476/lectures%0A/lecture-01.pdf => > `/Users/hguo/mywww//dasher.wustl.edu/bio5476/lectures%0A/lecture-01.pdf' > Connecting to dasher.wustl.edu[128.252.208.48]:80... connected. HTTP > request sent, awaiting response... 404 Not Found 09:13:16 ERROR 404: > Not Found. > > Please note that wget adds a special charactor '%0A' in the URL. > Maybe the Windows new line have one more charactor which is not > recoganized by Mac wget. > > I am using Mac OS X, Tigger Darwin. Hello I tested the following command: "wget -k -np -r -l inf -E http://dasher.wustl.edu/bio5476/" on Fedora Core 5, using wget-1.10.2-3.2.1 I don't know if I got every file or not (since I know nothing about the link that I downloaded) but I did get the file referred to in your original post: lecture-01.pdf Here is a link to the full output of wget: http://www.afolkey2.net/wget.txt and here is the output for the file that you mentioned as an example: --19:32:16-- http://dasher.wustl.edu/bio5476/lectures/lecture-01.pdf Reusing existing connection to dasher.wustl.edu:80. HTTP request sent, awaiting response... 200 OK Length: 1755327 (1.7M) [application/pdf] Saving to: `dasher.wustl.edu/bio5476/lectures/lecture-01.pdf' 1700K .......... .... 100% 462K=3.9s 19:32:20 (438 KB/s) - `dasher.wustl.edu/bio5476/lectures/lecture-01.pdf' saved [1755327/1755327] For everyone's information, I saw that the link was split into two lines just like the OP described. The difference between his experience and mine, though, was that the file with a split URL that he used as an example was downloaded just fine when I tried it. It appears that every PDF that has "lecture-" at the beginning of the name has a multi-line URL on the original index.html. On my experiment, wget downloaded 25 PDF files that had split (multi-line) URL's. This appears to be all of them that are linked to on the index.html page. Steven P. Ulrick -- 19:28:50 up 12 days, 23:26, 2 users, load average: 0.84, 0.86, 0.79