Re: [WWWOFFLE-Users] Re: not caching or at least deleting huge images from the cache

Andrew M. Bishop Fri, 16 Sep 2005 22:27:00 -0700

Christian Knoke <[EMAIL PROTECTED]> writes:

> On Wed, Sep 14, 2005 at 06:43:52PM +0100, Andrew M. Bishop wrote:
> 
> > This robust cache concept is one of the key drivers to the format of
> > the cached directory.  There is no central index that must be
> > preserved.
> 
> I'm running wwwoffle for years on machines more than 5 years old. I've got
> the impression that the multiple harddisk accesses necessary to find the
> file give a not small time penalty, when using wwwoffle with big caches on
> slow drives.


If WWWOFFLE is given a URL then it can work out the filename in a
short time.  All it needs is to calculate the md5 hash of the URL and
extract the protocol and hostname.  I don't think that it can be made
much quicker.  Even with a database of URLs in RAM there still needs
to be a conversion from URL to filename which is probably implemented
with a hash function.

Creating an index in the opposite direction is less quick.  Given a
hostname or a lasttime directory to create the index requires that
many disk files are opened, one for each URL.  This is a much less
common operation, so it is not so important compared to the normal URL
to filename lookup.

> Maybe using reiserfs would speed up things.

It might, or ext3 with hashed B-trees (using 'tune2fs -O dir_index')
might also be faster (perhaps this works with ext2 as well, not sure
from tune2fs man page).

> But shouldn't wwwoffle build up such an index in RAM, on startup?

I don't think that the complexity of maintaining the database is worth
the effort since the savings in time would be small.

-- 
Andrew.
----------------------------------------------------------------------
Andrew M. Bishop                             [EMAIL PROTECTED]
                                      http://www.gedanken.demon.co.uk/

WWWOFFLE users page:
        http://www.gedanken.demon.co.uk/wwwoffle/version-2.8/user.html

Re: [WWWOFFLE-Users] Re: not caching or at least deleting huge images from the cache

Reply via email to