Re: bayes sync is hogging cpu

Andreas Pettersson Mon, 25 Sep 2006 09:23:28 -0700

Here's an interesting observation.

I set bayes_auto_expire to 0 as a temporary solution, I thought, andrestarted spamd. The hogging occurs at least as often as before. Am Ilooking in the wrong direction or wouldn't this have helped something?


Another observation:
# sa-learn --dump magic:

bayes: cannot open bayes databases/usr/local/share/spamassassin/bayes/bayes_* R/W: lock failed:Interrupted system call

0.000          0          3          0  non-token data: bayes db version
0.000          0     437041          0  non-token data: nspam
0.000          0     253396          0  non-token data: nham
0.000          0    4616765          0  non-token data: ntokens
0.000          0 1156977303          0  non-token data: oldest atime
0.000          0 1159200779          0  non-token data: newest atime

0.000 0 1159199860 0 non-token data: last journalsync atime

0.000          0 1158904222          0  non-token data: last expiry atime

0.000 0 0 0 non-token data: last expireatime delta0.000 0 0 0 non-token data: last expirereduction count

last expiry atime converts to september 22, the same day my problemsstarted. But if the hogging continues even with bayes_auto_expire set to0, then where should I be looking instead?


Regards,
Andreas



Andreas Pettersson wrote:

Me again. Since I'm not getting any responses I better keep postingmore information as I've made some more investigating today.
Sometimes when I run sa-learn --force-expire I get this responsealmost immediately:
Bus error (core dumped)
When I run again the process just hogs until I break it after about 15minutes.
I have also changed bayes_learn_to_journal back to 0 and lock_methodto flock.
Now I get these in spamd.log:
Mon Sep 25 17:05:18 2006 [8853] warn: bayes: cannot open bayesdatabases /usr/local/share/spamassassin/bayes/bayes_* R/W: lockfailed: Interrupted system call
I also lowered --max-children from 8 to 6 with this result:
Mon Sep 25 17:11:03 2006 [6702] info: prefork: server reached--max-children setting, consider raising it
Here's some top output of a typical situation:
 PID USERNAME PRI NICE   SIZE    RES STATE    TIME   WCPU    CPU COMMAND
8287 spamd 132 0 48056K 44220K RUN 8:00 88.43% 88.43%perl5.8.78853 spamd 20 0 40416K 38356K lockf 0:11 1.32% 1.32%perl5.8.79128 spamd 20 0 38592K 36544K lockf 0:03 0.63% 0.63%perl5.8.78879 spamd 20 0 40804K 38484K lockf 0:08 0.59% 0.59%perl5.8.79103 spamd 20 0 39728K 37736K lockf 0:04 0.54% 0.54%perl5.8.7
-rw-------  1 spamd  wheel        45 Sep 25 17:04 bayes.mutex
-rw-------  1 spamd  wheel    240024 Sep 25 17:15 bayes_journal
-rw-------  1 spamd  wheel   1039920 Sep 25 17:04 bayes_journal.old
-rw-r--r--  1 spamd  wheel  83787776 Sep 25 16:09 bayes_seen
-rw-------  1 spamd  wheel  85901312 Sep 25 17:04 bayes_toks

# cat bayes.mutex
8287
6708
6708
6708
6708
6708
6708
6708
6708


What is wrong?! What is making spamd go *kaboom* several times an hour?
Is it something with expiring tokens that's not working correctly?
Is it normal to have an bayes_journal.old laying around?
What more can I do to find the cause?

If the core dump (22 MB) is of any interrest, I'll upload it somewhere.



Best regards,
Andreas





Andreas Pettersson wrote:
Ok, more information here.

I found in spamd.log this line when the problem started:
Fri Sep 22 19:55:22 2006 [74581] warn: bayes: expire_old_tokens:child processing timeout at /usr/local/bin/spamd line 1082
which was followed by lots of these:
Fri Sep 22 19:55:52 2006 [74581] warn: bayes: cannot open bayesdatabases /usr/local/share/spamassassin/bayes/bayes_* R/W:
lock failed: File exists
In an attempt to find what's wrong I changed bayes_learn_to_journalto 1. It didn't help, but at least I got rid of the 'lock failed:File exist' error messages in spamd.log and bayes also keeps working.For the moment I have a script that checks for bayes.lock existanceand kills the hogging process and removes the lock file. It runsevery minute..
I have tried change lock_method to flock, problem still there (butwith a new lock file name).I also tried a sa-learn --force-expire. It took about 30 sec tocomplete. It didn't solve my problem either.
Any ideas of what might be wrong?

Regards,
Andreas

Re: bayes sync is hogging cpu

Reply via email to