Folks,

  I am passing the outgoing queue of requests from a disconnected LAN
  back to a well connected machine for woffle -fetch.

  After fetching, I am packing up the fetched files for delivery back
  to the LAN machine for serving.

  In order to locate all the files created from this fetch session,
  including embedded images and recursive fetches, I am parsing the
  output of wwwoffle -fetch with a perl script.

  So far, so good.

  The problem now that arises is that if a second batch of requests
  arrives for another remote machine, the requests all pile up in
  the same outgoing directory, and my perl script can no longer
  differentiate between the batches.

  I could serialise all instances, but that slows things considerably.

  I am considering adding a --outgoing <dir> to the wwwoffle program,
  allowing me to use the same file cache but parse the batches separately.

  Is there a better way to do it ?

  I attach my little perl script.

Cheers,    Andy!

On Fri, 22 Feb 2002, Andy Rabagliati wrote:

> Hmm - but this will not pick up the associated images and things ?
> 
> Seems I will need to parse the wwwoffled logs ?
> 
> Cheers,   Andy!
> 
> On Fri, 22 Feb 2002, Andy Rabagliati wrote:
> 
> > Thanks very much.
> > 
> > After some experimentation, it seems it is pretty easy ..
> > 
> > [root@vision2000 wwwoffle-2.6d]# wwwoffle-ls outgoing
> > OL5cExynn7TtBZHt9CtZJ-g     349 Feb 22 13:15 http://www.yahoo.com/
> > OkuAAiJm7MiuHw9aecXwXyw     695 Feb 22 13:16 
>http://news.bbc.co.uk/low/english/world/africa/default.stm
> > 
> > Gives me a hashed filename starting with O.
> > 
> > After fetching these URLs, I get this :-
> > 
> > [root@vision2000 wwwoffle-2.6d]# for u in http://www.yahoo.com/ 
>http://news.bbc.co.uk/low/english/world/africa/default.stm ; do wwwoffle-ls $u ; done
> > DL5cExynn7TtBZHt9CtZJ-g   16790 Feb 22 14:03 http://www.yahoo.com/
> > DkuAAiJm7MiuHw9aecXwXyw   13657 Feb 21 12:52 
>http://news.bbc.co.uk/low/english/world/africa/default.stm
> > 
> > The same hashed names but beginning with D.

#! /usr/bin/perl -w
# parses the output of wwwoffle -fetch to pick up the files arriving in the cache
# Andy Rabagliati <[EMAIL PROTECTED]>

sub filetest {
    my $site = shift;
    my $url = shift;
    return if $url =~ "\n" || (length ($url) == 0);
    my $hash = qx ('/usr/local/bin/wwwoffle-ls' "http://$site/$url";);
    $hash =~ s/ .*//;           # strip trailing junk
    chomp ($hash);              # trailing lf
    $file = '/var/spool/wwwoffle/http/' . $site . '/' . $hash;
    print "./http/$site/$hash\n" if -f $file;
}

while (<>) {
    if ($_ =~ m%^Fetch Success http://([^/]+)/(.*)%) {
        &filetest ($1, $2);
    } elsif ($_ =~ m%^Not fetching http://([^/]+)/(.*) .Page Unchanged.$%) {
        &filetest ($1, $2);
    }
}

Reply via email to