To add a comment to this (quite old) message :- On Wed, 11 May 2005, Andrew M. Bishop wrote:
> "Paul A. Rombouts" <[EMAIL PROTECTED]> writes: > > > In the past few weeks I have been experimenting with a design where the > > URLs of > > the cached webpages are stored in a single compact database file called > > "urlhashtable". This file is mmapped to an area of address space that is > > shared > > between all WWWOFFLE processes. > > I try and keep WWWOFFLE simple, which is a good and often recommended > way of writing software. It is a method that tends to produce robust > software. > > I like the ability to be able to keep all of the files relating to one > host together in one directory. I can delete any or all of the files > (it is best if I delete the matching U* file for each D* I delete) for > a host and the program keeps on working. I can copy the host > directory (or files from a host directory) between machines without > needing to worry about WWWOFFLE failing to work. I don't even need to > tell WWWOFFLE that I have done this, it will work it out for itself. I make heavy use of this to scoop pages for schools which are never online. They pass their wget requests to another wwoffle installation which is online, which creates a dynamic wwwoffle instance, pulls the request through it, and then packs up the site directories. These packed up directories are passed by USB stick to the offline school. I very much appreciate the ability to just dump fresh site directories, or updated old directories, and have it 'just work'. The urlhashtable would break that ability. My 2c .. Cheers, Andy!
