Re: Using --spider to check for dead links?

2006-07-17 Thread Stefan Melbinger

Hi,

First of all thanks for the quick answer! :)

Am 18.07.2006 17:34, Mauro Tortonesi schrieb:

Stefan Melbinger ha scritto:
I need to check whole websites for dead links, with output easy to 
parse for lists of dead links, statistics, etc... Does anybody have 
experience with that problem or has maybe used the --spider mode for 
this before (as suggested by some pages)?

>
historically, wget never really supported recursive --spider mode. 
fortunately, this has been fixed in 1.11-alpha-1:


How will wget react when started in recursive --spider mode? It will 
have to download, parse and delete/forget HTML pages in order to know 
where to go, but what happens with images and large files like videos, 
for example? Will wget check whether they exist?


Thanks a lot,
  Stefan

PS: The background for my question is that my company wants to check 
large websites for dead links (without using any commercial software). 
Hours of Google-searching left me with wget, which seems to have the 
best fundamentals to do this...


Re: Using --spider to check for dead links?

2006-07-17 Thread Mauro Tortonesi

Stefan Melbinger ha scritto:

Hello,

I need to check whole websites for dead links, with output easy to parse 
for lists of dead links, statistics, etc... Does anybody have experience 
with that problem or has maybe used the --spider mode for this before 
(as suggested by some pages)?


If this should work, all HTML pages would have to be parsed completely, 
while pictures and other files should only be HEAD-checked for existence 
(in order to save bandwidth)...


Using --spider and --spider -r was not the right way to do this, I fear.

Any help is appreciated, thanks in advance!


hi stefan,

historically, wget never really supported recursive --spider mode. 
fortunately, this has been fixed in 1.11-alpha-1:


http://www.mail-archive.com/wget@sunsite.dk/msg09071.html

so, it will be included in the upcoming 1.11 release.

--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Using --spider to check for dead links?

2006-07-17 Thread Stefan Melbinger

Hello,

I need to check whole websites for dead links, with output easy to parse 
for lists of dead links, statistics, etc... Does anybody have experience 
with that problem or has maybe used the --spider mode for this before 
(as suggested by some pages)?


If this should work, all HTML pages would have to be parsed completely, 
while pictures and other files should only be HEAD-checked for existence 
(in order to save bandwidth)...


Using --spider and --spider -r was not the right way to do this, I fear.

Any help is appreciated, thanks in advance!

Greets,
  Stefan Melbinger