Tobias Tiederle <[EMAIL PROTECTED]> writes: > let's say you have the following structure: > > index.html > |-cool.html > | |-page1.html > | |-page2.html > | |- ... > | > |-crap.html > |-page1.html > |-page2.html > > now you want to download the whole structure, but you want to > exclude the crap (with -R/A or nice regex). If you look at recur.c, > crap.html is downloaded (and deleted), but all the pages linked in > crap.html will be downloaded as well. With the option I included, > all the crap will be totally ignored. I don't know how to achieve > this beahaviour with the current options.
You can't. -R/-A were never meant to be used that way -- witness the FTP code, where they're not applied to directories either. (In this sense HTML files are "directories" of a kind.) Maybe we could repurpose -I/-X so they can apply to HTML files and be used to ignore whole sub-hierarchies of the site? Although a bit unorthodox, that would be very much within the jurisdiction of those options.