"Paul A. Rombouts" <[EMAIL PROTECTED]> writes:
> It has come to my attention that nearly half of the files in the WWWOFFLE
> cache
> are small files used for storing the URLs of the cached webpages. These files
> have names consisting of the letter U followed by a Base64-encoded md5 hash
> and
> I will refer to them as U* files.
>
> Having many small files is quite inefficient. For example, on my system each
> U*
> file occupies 4K of disk space. With 78587 U* files that is 307MB of occupied
> disk space. However, the total size of these files is little more than 5MB.
> In the past few weeks I have been experimenting with a design where the URLs
> of
> the cached webpages are stored in a single compact database file called
> "urlhashtable". This file is mmapped to an area of address space that is
> shared
> between all WWWOFFLE processes.
I try and keep WWWOFFLE simple, which is a good and often recommended
way of writing software. It is a method that tends to produce robust
software.
I like the ability to be able to keep all of the files relating to one
host together in one directory. I can delete any or all of the files
(it is best if I delete the matching U* file for each D* I delete) for
a host and the program keeps on working. I can copy the host
directory (or files from a host directory) between machines without
needing to worry about WWWOFFLE failing to work. I don't even need to
tell WWWOFFLE that I have done this, it will work it out for itself.
There is no one special file that contains all of the magic to enable
the program to work. Even better there is no single file that will
cause the program to fail if it gets lost or corrupted.
I also keep WWWOFFLE simple by not having the processes communicate
between themselves. WWWOFFLE has many processes rather than
multi-threading or any other inter-process communication. This means
that any of the processes can die or start corrupting memory without
affecting any other.
On the other hand anybody is free to modify WWWOFFLE for their own
personal use or any other use allowed by the license. This is one of
the freedoms that free software gives you. You may want to do this to
learn about programming, to make a better WWWOFFLE or for any other
reason. Have fun and enjoy yourself.
--
Andrew.
----------------------------------------------------------------------
Andrew M. Bishop [EMAIL PROTECTED]
http://www.gedanken.demon.co.uk/
WWWOFFLE users page:
http://www.gedanken.demon.co.uk/wwwoffle/version-2.8/user.html