Re: [WWWOFFLE-Users] No more U* files

Paul A. Rombouts Thu, 12 May 2005 22:33:26 -0700

Andrew M. Bishop wrote:

I try and keep WWWOFFLE simple, which is a good and often recommended
way of writing software.  It is a method that tends to produce robust
software.

Yes, I remember that you are a strong believer in KISS (Keep It Simple, Stupid). However, having a background in physics, I like the saying attributed to Einstein better: "Make everything as simple as possible, but not simpler."

I like the ability to be able to keep all of the files relating to one
host together in one directory.  I can delete any or all of the files
(it is best if I delete the matching U* file for each D* I delete) for
a host and the program keeps on working.  I can copy the host
directory (or files from a host directory) between machines without
needing to worry about WWWOFFLE failing to work.  I don't even need to
tell WWWOFFLE that I have done this, it will work it out for itself.

I appreciate that there are big advantages to distributing the information over many separate files the way you have done. However, I have also discovered some major disadvantages. One of them is that small files take up a lot of disk space compared to the information they contain. This problem can be ameliorated by choosing a file system that can deal better with small files, but it still leaves the second problem: reading many separate files takes a lot of time. My main motivation for making the changes I described in my previous posting was my increasing dissatisfaction with the long time it takes to generate index pages of spool directories containing many web pages. My recent modifications have effected a very considerable improvement and I benefit from them daily.

There is no one special file that contains all of the magic to enable
the program to work.  Even better there is no single file that will
cause the program to fail if it gets lost or corrupted.

Should my url database file get completely lost or corrupted, this will only effect the ability the generate index pages, not the ability to read pages offline or write then online. When I run "wwwoffle -purge" a copy of the old url database is kept as a backup, so the most likely scenario in case of a failure is that I loose information about some of the most recent URLs visited (the names, not the content), but certainly not everything.

What you are effectively saying is that the file system is the best possible database system for WWWOFFLE. For storing of the contents of web pages, I agree, the file system works quite well. But for storing many smaller snippets of information, the file system is not a satisfactory database system. If you want to argue that my particular implementation of a URL database is not optimal, I am open for suggestions. But my experiments have proved that the file system is a poor choice (performance-wise) for storing the names of URLs.

I also keep WWWOFFLE simple by not having the processes communicate
between themselves.  WWWOFFLE has many processes rather than
multi-threading or any other inter-process communication.  This means
that any of the processes can die or start corrupting memory without
affecting any other.

The WWWOFFLE processes communicate via the file system and to a limited extent via their exit status. If the WWWOFFLE processes were completely isolated from each other, they wouldn't be able to do anything useful. A runaway WWWOFFLE process can still corrupt the file system and thus cause the server as a whole to fail. Of course if you use shared memory, you have to take precautions to prevent failures from propagating disastrously. But only a limited area of memory is shared, the rest is protected. That is not fundamentally different from using a shared file system, which is also a form of shared memory.

On the other hand anybody is free to modify WWWOFFLE for their own
personal use or any other use allowed by the license.  This is one of
the freedoms that free software gives you.  You may want to do this to
learn about programming, to make a better WWWOFFLE or for any other
reason.  Have fun and enjoy yourself.

I am perfectly aware of this, as I have expressed on my web page. Out of gratitude to writers of free software who have made their work freely available so that I can learn from them, I have taken the trouble to make my modifications available so that others can learn from my ideas. If you choose to ignore them because they don't mesh with your personal dogma, I think that is a pity, because I believe there is still a lot of room for improvement in WWWOFFLE.

--
Paul Rombouts

Re: [WWWOFFLE-Users] No more U* files

Reply via email to