Hello, According to the way Oreilly books talk about spidering and some of my own use cases, I believe there is an integration feature missing from wget. Basically I'd like to be able to pass potential URLs to download through an external script before they are retrieved.
You could: * ignores URLs that match a regex * change URLs like example.com/a/b/url?c/d => example.com/c/d * use an external program to extract links from SWF's * picks out JPG's/AVI's/... first, Please help me find a way to integrate something like this. I am currently thinking about creating a --pretend option, that prints URLs instead of trying to fetch them. I believe this will ultimately need fifos and all kinds of crap. Does anyone have an idea of how this may be possible? Thanks, Will Entriken