Hello, I want to archive a HTML page and « all the files that are necessary to properly display » it (Wget manual), plus all the linked images (<a href="linked_image_url"><img src="inlined_image_url"></a>). I tried most options and features : recursive archiving, including and excluding directories and file types... But I can't make up the right options to only archive the "index.html" page from the following hierarchy :
/pages/index.html ; displays image_1.png, links to image_2.png /pages/page_1.html ; linked from index.html /pages/images/image_1.png /images/image_2.png Consider image_2.png as a thumb of the image_1.png, that's why it's so important to archive it. The archive I want to get : /pages/index.html /pages/images/image_1.png /images/image_2.png If I set the -r -l1 (recursivity on, level 1) and -P (--page-requisites : necessary files) I will get the page_1.html, and I don't want it. And it seems excluding the /pages directory or only including "png" files doesn't affect the -P option behaviour. How can I force Wget not to archive the page_1.html file ? At least I would like it to clean the archive at the end. Also note that the page I try to archive links to many pages that I want to exclude from the archive so I can't affort to clean it manually. JM.