Folks,
I am passing the outgoing queue of requests from a disconnected LAN
back to a well connected machine for woffle -fetch.
After fetching, I am packing up the fetched files for delivery back
to the LAN machine for serving.
In order to locate all the files created from this fetch session,
including embedded images and recursive fetches, I am parsing the
output of wwwoffle -fetch with a perl script.
So far, so good.
The problem now that arises is that if a second batch of requests
arrives for another remote machine, the requests all pile up in
the same outgoing directory, and my perl script can no longer
differentiate between the batches.
I could serialise all instances, but that slows things considerably.
I am considering adding a --outgoing <dir> to the wwwoffle program,
allowing me to use the same file cache but parse the batches separately.
Is there a better way to do it ?
I attach my little perl script.
Cheers, Andy!
On Fri, 22 Feb 2002, Andy Rabagliati wrote:
> Hmm - but this will not pick up the associated images and things ?
>
> Seems I will need to parse the wwwoffled logs ?
>
> Cheers, Andy!
>
> On Fri, 22 Feb 2002, Andy Rabagliati wrote:
>
> > Thanks very much.
> >
> > After some experimentation, it seems it is pretty easy ..
> >
> > [root@vision2000 wwwoffle-2.6d]# wwwoffle-ls outgoing
> > OL5cExynn7TtBZHt9CtZJ-g 349 Feb 22 13:15 http://www.yahoo.com/
> > OkuAAiJm7MiuHw9aecXwXyw 695 Feb 22 13:16
>http://news.bbc.co.uk/low/english/world/africa/default.stm
> >
> > Gives me a hashed filename starting with O.
> >
> > After fetching these URLs, I get this :-
> >
> > [root@vision2000 wwwoffle-2.6d]# for u in http://www.yahoo.com/
>http://news.bbc.co.uk/low/english/world/africa/default.stm ; do wwwoffle-ls $u ; done
> > DL5cExynn7TtBZHt9CtZJ-g 16790 Feb 22 14:03 http://www.yahoo.com/
> > DkuAAiJm7MiuHw9aecXwXyw 13657 Feb 21 12:52
>http://news.bbc.co.uk/low/english/world/africa/default.stm
> >
> > The same hashed names but beginning with D.
#! /usr/bin/perl -w
# parses the output of wwwoffle -fetch to pick up the files arriving in the cache
# Andy Rabagliati <[EMAIL PROTECTED]>
sub filetest {
my $site = shift;
my $url = shift;
return if $url =~ "\n" || (length ($url) == 0);
my $hash = qx ('/usr/local/bin/wwwoffle-ls' "http://$site/$url");
$hash =~ s/ .*//; # strip trailing junk
chomp ($hash); # trailing lf
$file = '/var/spool/wwwoffle/http/' . $site . '/' . $hash;
print "./http/$site/$hash\n" if -f $file;
}
while (<>) {
if ($_ =~ m%^Fetch Success http://([^/]+)/(.*)%) {
&filetest ($1, $2);
} elsif ($_ =~ m%^Not fetching http://([^/]+)/(.*) .Page Unchanged.$%) {
&filetest ($1, $2);
}
}