Re: wget follow-excluded patch

Hrvoje Niksic Sun, 10 Apr 2005 15:05:18 -0700

Tobias Tiederle <[EMAIL PROTECTED]> writes:

> let's say you have the following structure:
>
> index.html
> |-cool.html
> |  |-page1.html
> |  |-page2.html
> |  |-  ...
> |
> |-crap.html
>    |-page1.html
>    |-page2.html
>
> now you want to download the whole structure, but you want to
> exclude the crap (with -R/A or nice regex).  If you look at recur.c,
> crap.html is downloaded (and deleted), but all the pages linked in
> crap.html will be downloaded as well.  With the option I included,
> all the crap will be totally ignored.  I don't know how to achieve
> this beahaviour with the current options.


You can't.  -R/-A were never meant to be used that way -- witness the
FTP code, where they're not applied to directories either.  (In this
sense HTML files are "directories" of a kind.)

Maybe we could repurpose -I/-X so they can apply to HTML files and be
used to ignore whole sub-hierarchies of the site?  Although a bit
unorthodox, that would be very much within the jurisdiction of those
options.

Re: wget follow-excluded patch

Reply via email to