Dennis Heuer <[EMAIL PROTECTED]> writes: > Your answer fits only half because I still have to choose -Ahtml,pdf > and I still get *at least* the first HTML page on my disk
The first HTML page will only be saved temporarily. You still shouldn't be needing to use -Ahtml,pdf instead of just -Apdf. > (try a page like this and you will see that you get a lot of > unwanted pages on your disk: > http://web.worldbank.org/external/default/main?theSitePK=258644&menuPK=258666®ion=119222&pagePK=51083064&piPK=51246258) The first problem with this page is that the PDF's are off-site, so you need to use -H to have Wget retrieve them. To avoid creating spurious directories, I recommend -nd, and to avoid deep recursion, -l1 is needed. This amounts to: wget -H -rl1 -nd -A.pdf 'http://web.worldbank.org/external/default/main?theSitePK=258644&menuPK=258666®ion=119222&pagePK=51083064&piPK=51246258' The other problem with this page is that it links to a lot of pages without a ".html" suffix in their URLs, such as http://www.worldbank.org/. -A bogusly doesn't reject these because it considers them to be "directories" rather than files. I'm not sure if that's exactly a bug, but it certainly doesn't look like a feature.