Hi there,


I had a big giant email planned here, but as I was writing it I narrowed down the scope of the problem we're having to a recursive stat call (I think) in vdelivermail.c


First, some background on the setup:



I'm in the process of migrating a 12 G, ~5000 user sendmail/aliases/virtualuser system to a qmail/vpopmail one, using MySQL as the backend and am having a single problem holding me up.


We've got a cluster of 3 delivery machines, with a /vpopmail parition shared over NFS. The NFS server is also the MySQL DB server where the backend is done. /vpopmail is a 3Ware RAID 10 running ReiserFS. (We've tried both the defaults and noatime/notail.)

All the 800 or so virtual domains are empty (save for the postmaster account) and filled with .qmail-vuser files that forward to &[EMAIL PROTECTED] When a vpopmail user is made at one of those domains, delivery happens instantaneously. Delivering to any vpopmail user at the default domain results in vdelivermail hanging for 2-10 minutes before finally delivering the message.

"vuserinfo -d [EMAIL PROTECTED]" works fine, which led me to believe it was not a MySQL table problem (we're not using many_domains).

The vdelivery hang occurs whether delivering directly ON the NFS server, or delivering on one of the cluster servers (though the time of the delay varies unpredictably), which leads me to think that it's not an NFS problem. Standard NFS read/writes are fine.

Additionally, copying files into and out of user's Maildirs manually works fine, and squirrelmail and courier-imap are handling the situation fine as well.

Attempted delivery to non-existant addresses gives a failure message immediately.

Manual testing was done with a line like below, to verify it wasn't anything else in qmail:

cat /vpopmail/testing/samplemail.txt | env EXT=cleaver HOST=defaultdomain.com vdelivermail '' bounce-no-mailbox


--------


Okay, as I was writing the above message, I decided to strace the running vdelivermail process and discovered that vdelivermail was looping here:

stat64("/etc/vpopmail/domains/defaultdomain.com/5/charlenes/Maildir//new/1078418383.M015727P2293.haku.defaultdomain.com", {st_mode=S_IFREG|0644, st_size=11180, ...}) = 0
stat64("/etc/vpopmail/domains/defaultdomain.com/5/charlenes/Maildir//new/1078418397.M208677P5866.haku.defaultdomain.com", {st_mode=S_IFREG|0644, st_size=2123, ...}) = 0
stat64("/etc/vpopmail/domains/defaultdomain.com/5/charlenes/Maildir//new/1078418401.M185492P7109.haku.defaultdomain.com", {st_m


[later]
stat64("/etc/vpopmail/domains/defaultdomain.com/E/gary/Maildir//new/1078419549.M564758P6609.haku.defaultdomain.com", {st_mode=S_IFREG|0644, st_size=2744, ...}) = 0
stat64("/etc/vpopmail/domains/defaultdomain.com/E/gary/Maildir//new/1078419549.M438602P6573.haku.defaultdomain.com", {st_mode=S


It appears to be stating every single message in every user underneath the default domain's directory(!). Given that there is about 12 GB of mail that's being transferred over in the test systems (before we go live), that would explain the long delay. As it gets cached by NFS or the local disk array, the time the stats take vary.

Any ideas on why it might be doing this? I'm looking over count_dir in vdelivermail.c right now and not seeing it. =(


Sincerely, Japheth "J.C." Cleaver



Reply via email to