Christian Knoke <[EMAIL PROTECTED]> writes:
> On Wed, Sep 14, 2005 at 06:43:52PM +0100, Andrew M. Bishop wrote:
>
> > This robust cache concept is one of the key drivers to the format of
> > the cached directory. There is no central index that must be
> > preserved.
>
> I'm running wwwoffle for years on machines more than 5 years old. I've got
> the impression that the multiple harddisk accesses necessary to find the
> file give a not small time penalty, when using wwwoffle with big caches on
> slow drives.
If WWWOFFLE is given a URL then it can work out the filename in a
short time. All it needs is to calculate the md5 hash of the URL and
extract the protocol and hostname. I don't think that it can be made
much quicker. Even with a database of URLs in RAM there still needs
to be a conversion from URL to filename which is probably implemented
with a hash function.
Creating an index in the opposite direction is less quick. Given a
hostname or a lasttime directory to create the index requires that
many disk files are opened, one for each URL. This is a much less
common operation, so it is not so important compared to the normal URL
to filename lookup.
> Maybe using reiserfs would speed up things.
It might, or ext3 with hashed B-trees (using 'tune2fs -O dir_index')
might also be faster (perhaps this works with ext2 as well, not sure
from tune2fs man page).
> But shouldn't wwwoffle build up such an index in RAM, on startup?
I don't think that the complexity of maintaining the database is worth
the effort since the savings in time would be small.
--
Andrew.
----------------------------------------------------------------------
Andrew M. Bishop [EMAIL PROTECTED]
http://www.gedanken.demon.co.uk/
WWWOFFLE users page:
http://www.gedanken.demon.co.uk/wwwoffle/version-2.8/user.html