My mail server supports a couple hundred users but only about 10 or 15
use spamassassin. Most of us using spamc/spamd are 
root/hostmaster/postmaster types that get around 1500 emails/day. We use
procmail for filtering like this in each user's procmailrc (not
/etc/procmailrc so people aren't forced to use it when they use procmail):

:0fw
| /usr/bin/spamc -f

:0e
{
EXITCODE=$?
}

spamd starts in local only mode: "spamd -d -L -F 1"

I was running 2.11 when I noticed today the load average was 50 on the
mail server. There were about 50 spamd procs, each running as the user
who connected to spamd, all hung. There was no strace output (RH Linux
box) on the hung procs. I killed off all the spamd procs, restarted
spamassassin, and upgraded to 2.20. It ran fine all afternoon, but tonight
the load average is 50 again.

Strangely enough, some spamd checks take only a few seconds, and others
take a long time:
  
  May  6 21:25:09 get spamd[19869]: clean message (2.9/9.0) for
  sonique:32914 in 293 seconds
  May  6 21:25:28 get spamd[21382]: clean message (0.0/9.0) for brad:128
  in   8 seconds.
  May  6 21:26:31 get spamd[20503]: clean message (1.2/9.0) for
  sonique:32914 in 258 seconds
  May  6 21:27:08 get spamd[21012]: identified spam (15.3/8.0) for
  bard:5492 in 175 seconds.
  May  6 21:27:14 get spamd[21822]: clean message (0.0/9.0) for
  brians:5529 in   3 seconds.
  May  6 21:27:15 get spamd[21828]: clean message (0.0/9.0) for nate:32844
  in   4 seconds.
  May  6 21:43:16 get spamd[24828]: clean message (1.1/9.0) for
  sonique:32914 in 299 seconds.
  May  6 21:43:18 get spamd[26638]: clean message (8.5/9.0) for
  sonique:32914 in   5 seconds.
  May  6 21:44:17 get spamd[26920]: clean message (3.9/9.0) for nate:32844
  in   0 seconds.
  May  6 21:45:35 get spamd[27275]: clean message (0.0/9.0) for nate:32844
  in   0 seconds.
  May  6 21:49:25 get spamd[28114]: clean message (3.9/9.0) for nate:32844
  in   3 seconds.
  May  6 21:49:52 get spamd[28157]: identified spam (11.6/8.0) for
  frankf:5388 in   3 seconds.

It may look like there's a pattern of user nate's checks going fast, but
that's only in this snippet. For all users the times vary from 0 to
sometimes *thousands* of seconds.

The only strace output I could get tonight from a hung spamd was this:

  mmap2(NULL, 249856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
  0) = 0x4032c000
  brk(0x8d8f000)                          = 0x8d8f000
  mmap2(NULL, 352256, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
  0) = 0x40369000

Not too exciting. I checked vmstat and top and there was no disk
bottleneck, or memory problems. Disk space isn't a problem either.
Basically I have hung spamd procs with no apparent reason, I avoid the
network checks so things like this wouldn't happen :(

I don't want to live life without spamassassin again. I really hope I
find a fix soon.
-- 
A mathematician is an engine for converting coffee into theorems.


_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to