Hi -

[this message is probably only of interest to people running a
development version of vpopmail (post 4.10.3) with their domains on
NFS mounted drives]

I noticed in a recent version of vpopmail (version 4.10.3), the
delivery method (deliver_mail) has stopped delivering to the
Maildir/tmp (from now on referred to as /tmp) directory first in order
to remove a system call to rename from the code (the /tmp file was
then linked to Maildir/new - referred to from now on as /new).

I've been looking at the new code (version 4.10.14) and it appears to
be going straight to the /new directory now which is incorrect if you
are trying to follow the Maildir algorithm (that ensures
safe/guaranteed delivery, requires no locking under NFS etc). The
whole point of delivering first to /tmp and then linking this file to
/new is to stop corrupt messages ever hitting Maildir/new and to
ensure reliable delivery under NFS failures/machine crashes etc. Once
a message is linked from /tmp to /new (and then stat()'d to make sure
its there) the idea is that it guaranteed not to be
incomplete/undelivered. After linking it to the /new area, assuming
there is no error we can remove the /tmp file.

With the development versions (4.10.3+) if the server crashes during
delivery (mid file write) this will leave a corrupt message on the
server in the /new delivery area, something the Maildir format is
designed not to do.. Admittedly Qmail will probably try and redeliver
the message as it has not been delivered successfully but without the
initial delivery to /tmp a user can receive corrupt messages.

There is also a bug for NFS in that fsync() (or fdatasync?) is not
called on the file vdelivermail writes to in the delivery area so you
do not guarantee it has actually been written. This means the file can
be successfully delivered according to vdelivermail but in reality
only locally cached and not written out (commited?) to the NFS drive.
Vdelivermail will have returned success but if the machine crashes at
this point the mail can potentially disappear if the NFS mount doesnt
recover properly.

Is there any reason (other than optimisation) that these calls have
been removed (or in the case of the missing fsync not implemented)?

Marcus

--
Marcus Williams - http://www.onq2.com
Quintic Ltd, 39 Newnham Rd, Cambridge, CB3 9EY

Reply via email to