I got one bug on Mac OS X

2006-07-15 Thread HUAZHANG GUO
Dear Sir/Madam,while I was trying to download using the command:	wget -k -np -r -l inf -E http://dasher.wustl.edu/bio5476/I got most of the files, but lost some of them.I think I know where the problem is:if the link is broken into two lines in the index.html:

Lecture 1 (Jan 17): Exploring Conformational Space for Biomoleculeshttp://dasher.wustl.edu/bio5476/lectures/lecture-01.pdf">[PDF]

I will get the following error message:--09:13:16--  http://dasher.wustl.edu/bio5476/lectures%0A/lecture-01.pdf           => `/Users/hguo/mywww//dasher.wustl.edu/bio5476/lectures%0A/lecture-01.pdf'Connecting to dasher.wustl.edu[128.252.208.48]:80... connected.HTTP request sent, awaiting response... 404 Not Found09:13:16 ERROR 404: Not Found.Please note that wget adds a special charactor '%0A' in the URL.  Maybe the Windows new line have one more charactor which is not recoganized by Mac wget.I am using Mac OS X, Tigger Darwin.Thanks!

RE: I got one bug on Mac OS X

2006-07-15 Thread Tony Lewis



I don't think that's valid HTML. According to RFC 
1866: An HTML user 
agent should treat end of line in any of its variations as a word space in all 
contexts except preformatted text.
I 
don't see any provision for end of line within the HREF attribute of an A 
tag.
 
Tony


From: HUAZHANG GUO [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 11, 2006 7:48 AMTo: 
[EMAIL PROTECTED]Subject: I got one bug on Mac OS 
X

Dear Sir/Madam,
while I was trying to download 
using the command:

wget -k -np -r 
-l inf -E http://dasher.wustl.edu/bio5476/

I got most of the files, but lost some of them.

I think I know where the problem is:

if the link is broken into two lines in the index.html:

<P>Lecture 1 (Jan 17): Exploring Conformational Space 
for Biomolecules
<A HREF=""http://dasher.wustl.edu/bio5476/lectures">http://dasher.wustl.edu/bio5476/lectures
/lecture-01.pdf">[PDF]</A></P>

I will get the following error 
message:

--09:13:16-- http://dasher.wustl.edu/bio5476/lectures%0A/lecture-01.pdf
=> 
`/Users/hguo/mywww//dasher.wustl.edu/bio5476/lectures%0A/lecture-01.pdf'
Connecting to dasher.wustl.edu[128.252.208.48]:80... connected.
HTTP request sent, awaiting response... 404 Not Found
09:13:16 ERROR 404: Not Found.

Please note that wget adds a special charactor '%0A' in the URL. Maybe the 
Windows new line have one more charactor which is not recoganized by Mac 
wget.

I am using Mac OS X, Tigger Darwin.


Thanks!








Re: I got one bug on Mac OS X

2006-07-15 Thread Steven P. Ulrick
On Sat, 15 Jul 2006 16:36:54 -0700
"Tony Lewis" <[EMAIL PROTECTED]> wrote:

> I don't think that's valid HTML. According to RFC 1866: An HTML user
> agent should treat end of line in any of its variations as a word
> space in all contexts except preformatted text.
> 
> I don't see any provision for end of line within the HREF attribute
> of an A tag.
>  
> Tony
>   _  
> 
> From: HUAZHANG GUO [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, July 11, 2006 7:48 AM
> To: [EMAIL PROTECTED]
> Subject: I got one bug on Mac OS X
> 
> 
> Dear Sir/Madam,
> 
> while I was trying to download using the command: 
> 
> wget -k -np -r -l inf -E http://dasher.wustl.edu/bio5476/
> 
> I got most of the files, but lost some of them.
> 
> I think I know where the problem is:
> 
> if the link is broken into two lines in the index.html:
> 
> Lecture 1 (Jan 17): Exploring Conformational Space for Biomolecules
> http://dasher.wustl.edu/bio5476/lectures
> /lecture-01.pdf">[PDF]
> 
> 
> I will get the following error message:
> 
> 
> --09:13:16--
> http://dasher.wustl.edu/bio5476/lectures%0A/lecture-01.pdf =>
> `/Users/hguo/mywww//dasher.wustl.edu/bio5476/lectures%0A/lecture-01.pdf'
> Connecting to dasher.wustl.edu[128.252.208.48]:80... connected. HTTP
> request sent, awaiting response... 404 Not Found 09:13:16 ERROR 404:
> Not Found.
> 
> Please note that wget adds a special charactor '%0A' in the URL.
> Maybe the Windows new line have one more charactor which is not
> recoganized by Mac wget.
> 
> I am using Mac OS X, Tigger Darwin.

Hello
I tested the following command:
"wget -k -np -r -l inf -E http://dasher.wustl.edu/bio5476/";
on Fedora Core 5, using wget-1.10.2-3.2.1 

I don't know if I got every file or not (since I know nothing about the
link that I downloaded) but I did get the file referred to in your
original post: lecture-01.pdf

Here is a link to the full output of wget:
http://www.afolkey2.net/wget.txt

and here is the output for the file that you mentioned as an example:
--19:32:16--  http://dasher.wustl.edu/bio5476/lectures/lecture-01.pdf
Reusing existing connection to dasher.wustl.edu:80.
HTTP request sent, awaiting response... 200 OK
Length: 1755327 (1.7M) [application/pdf]
Saving to: `dasher.wustl.edu/bio5476/lectures/lecture-01.pdf'
1700K ..    100%
462K=3.9s 19:32:20 (438 KB/s) -
`dasher.wustl.edu/bio5476/lectures/lecture-01.pdf' saved
[1755327/1755327]

For everyone's information, I saw that the link was split into two
lines just like the OP described.  The difference between his
experience and mine, though, was that the file with a split URL that he
used as an example was downloaded just fine when I tried it.   It
appears that every PDF that has "lecture-" at the beginning of the name
has a multi-line URL on the original index.html.  On my experiment,
wget downloaded 25 PDF files that had split (multi-line) URL's.  This
appears to be all of them that are linked to on the index.html page.

Steven P. Ulrick
-- 
 19:28:50 up 12 days, 23:26,  2 users,  load average: 0.84, 0.86, 0.79


Re: I got one bug on Mac OS X

2006-07-16 Thread Hrvoje Niksic
"Tony Lewis" <[EMAIL PROTECTED]> writes:

> I don't think that's valid HTML. According to RFC 1866: An HTML user
> agent should treat end of line in any of its variations as a word
> space in all contexts except preformatted text.  I don't see any
> provision for end of line within the HREF attribute of an A tag.

Unrelated to this particular bug, please note that rfc1866 is not the
place to look for an up-to-date HTML specification.  HTML has been
maintained by W3C for many years, so it's best to look there,
e.g. HTML 4.01 spec, or possibly XHTML.


RE: I got one bug on Mac OS X

2006-07-16 Thread Tony Lewis
Hrvoje Niksic wrote:

> HTML has been maintained by W3C for many years 

I knew that (but forgot) -- just went to ietf.org out of habit looking for
Internet specifications.

Tony



Re: I got one bug on Mac OS X

2006-07-17 Thread HUAZHANG GUO

Thanks, then I am sure that is a Mac OS X Tiger specific problem.



On Jul 15, 2006, at 7:48 PM, Steven P. Ulrick wrote:


On Sat, 15 Jul 2006 16:36:54 -0700
"Tony Lewis" <[EMAIL PROTECTED]> wrote:


I don't think that's valid HTML. According to RFC 1866: An HTML user
agent should treat end of line in any of its variations as a word
space in all contexts except preformatted text.

I don't see any provision for end of line within the HREF attribute
of an A tag.

Tony
  _

From: HUAZHANG GUO [mailto:[EMAIL PROTECTED]
Sent: Tuesday, July 11, 2006 7:48 AM
To: [EMAIL PROTECTED]
Subject: I got one bug on Mac OS X


Dear Sir/Madam,

while I was trying to download using the command:

wget -k -np -r -l inf -E http://dasher.wustl.edu/bio5476/

I got most of the files, but lost some of them.

I think I know where the problem is:

if the link is broken into two lines in the index.html:

Lecture 1 (Jan 17): Exploring Conformational Space for  
Biomolecules

http://dasher.wustl.edu/bio5476/lectures
/lecture-01.pdf">[PDF]


I will get the following error message:


--09:13:16--
http://dasher.wustl.edu/bio5476/lectures%0A/lecture-01.pdf =>
`/Users/hguo/mywww//dasher.wustl.edu/bio5476/lectures%0A/ 
lecture-01.pdf'

Connecting to dasher.wustl.edu[128.252.208.48]:80... connected. HTTP
request sent, awaiting response... 404 Not Found 09:13:16 ERROR 404:
Not Found.

Please note that wget adds a special charactor '%0A' in the URL.
Maybe the Windows new line have one more charactor which is not
recoganized by Mac wget.

I am using Mac OS X, Tigger Darwin.


Hello
I tested the following command:
"wget -k -np -r -l inf -E http://dasher.wustl.edu/bio5476/";
on Fedora Core 5, using wget-1.10.2-3.2.1

I don't know if I got every file or not (since I know nothing about  
the

link that I downloaded) but I did get the file referred to in your
original post: lecture-01.pdf

Here is a link to the full output of wget:
http://www.afolkey2.net/wget.txt

and here is the output for the file that you mentioned as an example:
--19:32:16--  http://dasher.wustl.edu/bio5476/lectures/lecture-01.pdf
Reusing existing connection to dasher.wustl.edu:80.
HTTP request sent, awaiting response... 200 OK
Length: 1755327 (1.7M) [application/pdf]
Saving to: `dasher.wustl.edu/bio5476/lectures/lecture-01.pdf'
1700K ..    100%
462K=3.9s 19:32:20 (438 KB/s) -
`dasher.wustl.edu/bio5476/lectures/lecture-01.pdf' saved
[1755327/1755327]

For everyone's information, I saw that the link was split into two
lines just like the OP described.  The difference between his
experience and mine, though, was that the file with a split URL  
that he

used as an example was downloaded just fine when I tried it.   It
appears that every PDF that has "lecture-" at the beginning of the  
name

has a multi-line URL on the original index.html.  On my experiment,
wget downloaded 25 PDF files that had split (multi-line) URL's.  This
appears to be all of them that are linked to on the index.html page.

Steven P. Ulrick
--
 19:28:50 up 12 days, 23:26,  2 users,  load average: 0.84, 0.86, 0.79