Jörn Nettingsmeier wrote:
hi everyone !

i'm trying to set up a website monitoring tool for a university research project. the idea is to use wget to archive politician's websites once a week to analyse their campaigns in the last 4 weeks before the election.

i have hit a few snags, and i would welcome comments.
my wget is a binary release that was shipped with suse linux 9.2 ("GNU Wget 1.9+cvs-dev"), architecture is i386.

i just confirmed all three issues with latest cvs.


wget spans hosts when it shouldn't:


wget seems to choke on directories that start with a dot. i guess it thinks they are references to external pages and does not download links containing such directory names.


wget does not parse css stylesheets and consequently does not retrieve url() references, which leads to missing background graphics on some sites.

ps: please retain the cc: list. thanks.



Reply via email to