On Tuesday 01 February 2005 7:32 pm, Michael Bowe wrote:
> ----- Original Message -----
> From: "Tom Collins" <[EMAIL PROTECTED]>
>
> > I've been thinking about modifying vpopmail to use directory names like
> > @a, @b, @c, etc. instead of a, b, c so that we could allow
> > one-character user directories in all cases.  I don't know how we'd
> > make that work with existing directory structures though.  Another
> > solution would be to hash all usernames, so there weren't any users in
> > the top directory.  I guess we'd still run some risks of qmailadmin
> > creating a mailing list or autoresponder with the same directory name
> > and then later deleting it.
>
> I have often wished the directory hashing worked in a more
> simple/logical fashion

We choose what looked to be the most efficent method of storing the
directories and attempted to automate as much as possible.

>
> Is the complexity of the current system really required? 
I think so. So did all the developers who worked on the code.
Our testing showed that it was required.

> It is so complex 
> that none of the developers even appear to understand how it works!
I do.

> The documentation states that vpopmail uses a self balancing tree that is
> able to support up to 23 million domains, each with up to 23 million
> users. However I am a bit doubtful about the "self balancing" part.
> For example if you add some domains and then go back and delete
> them later, further new domains dont get added to these now-vacant
> parts of the tree

You are right. The algorythm does not prior deleted entries and reuse them.
Since we have 23 million directory slots to use, we decided not to bother
with looking for empty directory slots. Just create a new one.

>
> On other (non-vpopmail) virtual mail systems that I have used, the
> hashing system is typically much more simple - with the sysadmin
> choosing how many levels of hashing were required, and then
> just hashing the dirs using the leading portions of the username/domain
>
> eg "userhash" level of 1 means all user dirs would be hashed like this
>   exampledomain.com\
>     a/<then all the usernames starting with a>
>     b/<then all the usernames starting with b>
>     c/<then all the usernames starting with c>
>     etc
>
> a "userhash" level of 2 means that user dirs would be hashed like this
>   exampledomain.com/
>     a/aa/<then all the usernames starting with aa>
>     a/ab/<then all the usernames starting with ab>
>     a/ac/<then all the usernames starting with ac>
>     etc
>
> same sort of system could be used for a "domainhash"
>
> There is no disputing that this system would end up having some dirs with
> more entries than others, but even on a system with many user accounts,
> probably two or at the most three level of hashing would prevent any single
> dir from becoming excessively large.  The current vpopmail system doesn't
> seem to balance the dirs evenly anyway, so it is not like we would be
> loosing any functionality there.
>
> Having a logical directory layout like this also simplifies other issues.
> Eg recovering from a corruption in the file/db that stores the hashing
> info, or moving domains/accounts between servers.

Your proposed algorythm is the problem we were trying to solve.
One large vpopmail site had broken up their directories like you suggested.
But they were having heavy disk I/O which they finally tracked down to
the OS walking through the directory structure. So we came up with
current system. They both were benchmarked against each other. The
winner was to not use more than 300 sub directories in any one directory.

Hope that helps explain why it was built this way.

Ken Jones
inter7.com

>
> Michael.

Reply via email to