Jörn Nettingsmeier wrote:
hi everyone !


i'm trying to set up a website monitoring tool for a university research project. the idea is to use wget to archive politician's websites once a week to analyse their campaigns in the last 4 weeks before the election.


i have hit a few snags, and i would welcome comments.
my wget is a binary release that was shipped with suse linux 9.2 ("GNU Wget 1.9+cvs-dev"), architecture is i386.

i just confirmed all three issues with latest cvs.

[1]

wget spans hosts when it shouldn't:


[2]

wget seems to choke on directories that start with a dot. i guess it thinks they are references to external pages and does not download links containing such directory names.

[3]

wget does not parse css stylesheets and consequently does not retrieve url() references, which leads to missing background graphics on some sites.


ps: please retain the cc: list. thanks.



regards,

jörn



Reply via email to