On Thu, 5 Dec 2019 17:07:05 +0100
Matus UHLAR - fantomas wrote:
seems some big mails were too long to scan, and SA even got killed.

[2146809.213586] Out of memory: Kill process 3660 (spamassassin)
score 365 or sacrifice child [2146809.213613] Killed process 3660
(spamassassin) total-vm:2960664kB, anon-rss:2921892kB, file-rss:0kB,
shmem-rss:0kB [2146809.270342] oom_reaper: reaped process 3660
(spamassassin), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

I see the mail body contains nearly 20MB uuencoded text (don't ask).

I found some body rules that contain ".*" instead of a sane
quantifier:

72_active.cf:rawbody            __HAS_HREF      /^[^>].*?<a href=/im
72_active.cf:rawbody            __HAS_HREF_ONECASE      /^[^>].*?<(a
href|A HREF)=/m 72_active.cf:rawbody            __HAS_IMG_SRC
/^[^>].*?<img src=/im 72_active.cf:rawbody  __HAS_IMG_SRC_DATA
/^[^>].*?<img src=['"]data/im 72_active.cf:rawbody
__HAS_IMG_SRC_ONECASE   /^[^>].*?<(img src|IMG SRC)=/m

There are different checks that have the "*" quantifier tho.
Is it reasonable to replace them with {0,1000} globally?

On 05.12.19 17:21, RW wrote:
In rawbody rules the text is broken into chunks of 1024 to 2048 bytes,
so the worst case isn't all that much worst than with {0,1000}.

Also  /m means that .* wont cross a line boundary in the decoded text
and  ^ can match in the middle of the chunk. This make the average
processing  time less sensitive to any upper limit on .*.

so it is not the quantifiers who cause SA taking too much of memory?

any idea how to debug that?

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
The 3 biggets disasters: Hiroshima 45, Tschernobyl 86, Windows 95

Reply via email to