Felix Karpfen <[EMAIL PROTECTED]> writes:

> Dan Jacobson wrote:
> > To see what files have been fetched repeatedly recently, try
> > cd /var/cache/wwwoffle && find *time* -name U\*|xargs sed :|sort|uniq -c|sort -nr
> 
> OK - did that; a relevant extract is attached.
> 
> Some of the attached entries are monitored URLs, where the content of
> the page is known to change each day.  Others (mainly .gif entries) are
> unsolicited additions to entries that had been flagged for monitoring
> and that <may|may not> be changed; the latter are only a small fraction
> of the URLs that WWWOFFLE lists daily during the download with the
> comment "Unchanged; not fetched" (or words to that effect).  
> 
> So is there a remedy or is it just an opportunity for teeth-nashing?

Dan's script provides an opportunity for people to fine-tune their
WWWOFFLE configuration to reduce the bandwidth.

I tried the script and there were no GIFs that were fetched daily, the
only URLs that appeared 8 times were the ones that were monitored.
The reason for this is that I have got some options in the
configuration file to stop certain URLs from being re-checked often.

As an example part of my wwwoffle.conf is below.

OnlineOptions
{
 <*://*zdnet.*/*/graphics/*>        request-changed = 4w
 <*://*zdnet.*/anchordesk/images/*> request-changed = 4w
 <*://*zdnet.*/clear/*>             request-changed = 4w
 <*://*zdnet.*/graphics/*>          request-changed = 4w

 <*://images*.slashdot.org/*>       request-changed = 4w
}

This stops the specified URLs from being re-fetched within 4 weeks.
If I monitor any of the pages from these sites it is only the
monitored page that is updated.

-- 
Andrew.
----------------------------------------------------------------------
Andrew M. Bishop                             [EMAIL PROTECTED]
                                      http://www.gedanken.demon.co.uk/

WWWOFFLE users page:
        http://www.gedanken.demon.co.uk/wwwoffle/version-2.7/user.html

Reply via email to