Wget Bug: Re: not downloading everything with --mirror

Max Bowsher Thu, 15 Aug 2002 11:39:33 -0700

Funk Gabor wrote:
>> HTTP does not provide a dirlist command, so wget parses html to find
>> other files it should download. Note: HTML not XML. I suspect that
>> is the problem.
>
> If wget wouldn't download the rest, I'd say that too. But 1st the dir
> gets created, the xml is dloaded (in some other directory some *.gif
> too) so wget "senses" the directory. If I issue the wget -m site/dir
> then all of the rest comes down, (index.html?D=A and others too) so
> wget is able to get everything but not at once. So there would be no
> technical limitation for wget to make it happen in one step. So it is
> either a missing feature (shall I say, a "bug" as wget can't do the
> mirror which it could've) or I was unable to find some switch which
> makes it happen at once.


Hmm, now I see. The vast majority of websites are configured to deny directory
viewing. That is probably why wget doesn't bother to try, except for the
directory specified as the root of the download. I don't think there is any
option to do this for all directories, because its not really needed. The _real_
bug is that wget is failing to parse what look like valid <img ... src="..."
...> tags. Perhaps someone more familiar with wget's html parsing code could
investigate? The command is: wget -r -l0 www.jeannette.hu/saj.htm and ignored
files are a number of image files.

Max.

Wget Bug: Re: not downloading everything with --mirror

Reply via email to