Andy Rabagliati <[EMAIL PROTECTED]> writes:
> At the moment a sites files, and URL unhash, are kept in a flat
> directory /var/spool/wwwoffle/http/www.domain.com/*.
>
> These directories can get really big, and can take a significant
> time to open.
>
> Can you hash these into subdirectories please, like squid and co.?
When I tried squid (a long time ago, it might have changed now) there
were no directories per host, instead there was an enormous set of
pre-created directories. These directories were then filled up at
random as new files were cached so that there are equal probabilities
of a new file going into any directory.
Like you say the WWWOFFLE cache is different because there is a
directory for each site that contains all the files for that site.
The problem with the change that you are suggesting is that there is a
wide variation in the number of files that are stored in any
directory.
For example, I currently have 808 directories for different hosts that
I have cached files from. Of this total there are 195 directories
that have only a single URL stored in them (24.4%). Directories with
5 URLs or fewer make up 47.3% of them, 10 URLs or fewer is 59.8% and
20 URLs or fewer is 75.5%. I don't know what you would consider as a
large directory that would be slow, but I would guess 256 files is OK.
This would be 128 URLs which is 97.4% of the directories. This means
that there are fewer than 3% of the directories that would benefit
from this change.
What size directories do you have that cause problems?
What are you doing when you notice that there is the time delay (is it
creating the host index, or opening a URL from the host or something
else)?
What filesystem are you using, reiserfs has all sorts of features to
speed up directory accesses, perhaps this would help?
--
Andrew.
----------------------------------------------------------------------
Andrew M. Bishop [EMAIL PROTECTED]
http://www.gedanken.demon.co.uk/
WWWOFFLE users page:
http://www.gedanken.demon.co.uk/wwwoffle/version-2.7/user.html