Hi all,

While trying to use wget to mirror an ftp site exposed as a website (other options like ftp or rsync are not available to us), we are running into problems handling index.html files.

Apache httpd's mod_dir adds a number of handy links across the top of a directory listing, allowing the end user to reorder the links in any order they like, ascending or descending.

The problem is that wget tries to download each combination of these index pages, meaning that instead of getting the index once, it gets the index of a directory a total of 9 times.

I tried to use the -R option to suppress the download of index.html files (the index is downloaded on the request for the directory anyway, so there is no need to follow any links containing index.html), but wget downloads these index.html links anyway, then deletes them.

The end result is that load is placed on the server, lots of bandwidth is wasted to the client, and the mirror process takes forever to complete.

Does wget support a true "do not follow these links ever" option?

Regards,
Graham
--

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to