Remi POISSONNIER <[EMAIL PROTECTED]> wrote:
> I am a french teacher in a college (students from 11 to 16) and the network 
>administrator.
> I am pleased to see that v2.6 use a `blackList` : List that the server refuses to 
>get.
> Is this list regulary updated ?

If you are refering to the list that is linked to from the WWWOFFLE
hints and tips page then you should not consider it an exhaustive
list.  It is purely the list that I use, created by myself with a few
suggestions from other WWWOFFLE users.  It targets adverts and not
adult content.

> How can I use the regulary updated list from :
> ftp://ftp.univ-tlse1.fr/pub/reseau/cache/squidguard_contrib/adult.tar.gz
> which containts two lists

Since neither of the two lists contain anything other than a list of
domains or URLs it is very easy to convert them.  The Perl script at
the bottom of this message works for me.

The only problem is that WWWOFFLE has not been optimised to handle
such a long list of sites.  It works, but there will be a slow down
(although I did not notice it when I tried it).  It also makes the
WWWOFFLE memory usage much larger.  Without the list the wwwoffled
process uses 2 MB, with the latest list that you mention the process
size increased to 19 MB.

This seems a very large amount of memory.  I would have expected the
memory used to be equal to the size of the files (~ 2 MB) plus some
bytes for each entry (~ 110k entries, perhaps 32 bytes for each = ~ 3
MB).  This could indicate a WWWOFFLE memory leak, or it could really
be the required memory.

If people want to try using WWWOFFLE with such a large list I would be
interested to hear the results.  I am confident that it will work, but
there will be a memory and speed impact.


-------------------- adult_wwwoffle.pl --------------------
#!/usr/bin/perl

$file="adult.tar.gz";

# Domains

print "\n";
print "#\n";
print "# Domains\n";
print "#\n";
print "\n";

open(DOMAINS,"tar -xOzf $file adult/domains|");

while(<DOMAINS>)
  {
   chop($domain=$_);
   print "*://$domain/*\n";
  }

close(DOMAINS);

# Urls

print "\n";
print "#\n";
print "# Urls\n";
print "#\n";
print "\n";

open(URLS,"tar -xOzf $file adult/urls|");

while(<URLS>)
  {
   chop($url=$_);
   print "*://$url*\n";
  }

close(URLS);
-------------------- adult_wwwoffle.pl --------------------

(This could have been a shell script, but I decided to use Perl since
that is what I nornmally use.)

-- 
Andrew.
----------------------------------------------------------------------
Andrew M. Bishop                             [EMAIL PROTECTED]
                                      http://www.gedanken.demon.co.uk/

WWWOFFLE users page:
        http://www.gedanken.demon.co.uk/wwwoffle/version-2.6/user.html

Reply via email to