Re: Email with attachment caused 100% CPU usage.

Bill Cole Fri, 10 Jun 2016 15:27:40 -0700

On 10 Jun 2016, at 6:17, Merijn van den Kroonenberg wrote:


[...]

From the manual:
This is a best-effort advisory setting, processing will not beabruptly
aborted at an arbitrary point in processing when the time limit is
exceeded, but only on reaching one of locations in the program flow
equipped with a time test. Currently equipped with the test are themain
checking loop, asynchronous DNS lookups, plugins which are calling
external programs. Rule evaluation is guarded by starting a timer(alarm)
on each set of compiled rules.

The last line is critical. The alarm isn't even on individual rules butrather on precompiled sets of rules. I already knew it was soft but thatparticular detail had not stuck in any of the many times I've read thatman page. Thanks for quoting it.

What does this mean, can still a single operation take more than this
time_limit? But I guess the timer on the rules means the rules atleast
cannot take more than time_limit, right?

Nope: time_limit is in seconds and I had it set to 270. The default is300 which is 1/2 the canonical SMTP EOD timeout but matches thecanonical "server timeout" (how long a server should wait for a clientcommand. See RFC 5321) and I'm sure there's misunderstanding on thatdistinction. Since under peak loads the plumbing of that particularsystem can add whole seconds and some clients can be a bit impatient, Igave it what I thought was very generous additional headroom although itreally didn't need it. The actual normal spamd scantime distributionthere roughly 2/3 under 1 second, 90% under 2 seconds, 99% under 10seconds, 99.9% under a minute. There's a long thin tail out to ~2minutes, but until last week I'd never had anything actually hit thetimeout that I can recall, and the 1st bad rule was hardly fresh. Ithink it is >8 years old and the others I found had not caused troublein their multiple (3?) years of existence.

I figured out which rule was the proximal cause by running the messagethrough the spamassassin script with rule debugging on so I could narrowdown the bad rule based on what matched right beforeTIME_LIMIT_EXCEEDED. I didn't dtrace the process to nail it down, but myhypothesis is that ultimately Perl is calling its low-level internalequivalent of regexec() which, like POSIX regexec(), has no timeoutfacility: it runs until it matches or exhausts the input of startingpoints. At the Perl level, I've never encountered any way to limit thetime '=~' takes to operate, but I'm no Larry Wall so maybe there's somearcane way to do that. It seems clear that if Perl has such a feature tobreak out of an operator routine that is taking too much clock time, ithas not been used in SpamAssassin. I'd bet on there NOT being such afeature and further, that the process doing the match might not even dieimmediately with a SIGKILL while inside that call.

What most annoys me about this is that the potential for blowing upsystems with REs is the first thing one learns about them in a formalsetting (rather than just by reading man pages above the BUGS section.)Back when I first got that warning the emphasis was on an ability of REsto compile to disastrous scale but back then that meant a few megabytes,and who cares about that today. However, I also got the warning decadesago that '.*' could cause a RE to take a long time if you didn't takecare to limit your input size and write the RE to rule out most startingpoints fast, but again absolute sizes matter and until last week I'd notenvisioned "optimizing" HTML by removing all formally unnecessarywhitespace including line breaks. This is obviously somewhat rare, butit's apparently A Thing HTML Parsers Like and this was a big hunk ofHTML, so I guess optimizing parsing was important...

It will be interesting to see the stats on scantimes this week to see ifmy tightening up on sloppy rules has an impact. I expect it will, sinceI now have a concrete theory to explain that long tail out to 2 minutes,which before now I've ignored as pure noise.

Re: Email with attachment caused 100% CPU usage.

Reply via email to