On Thu, 5 Dec 2019 17:07:05 +0100
Matus UHLAR - fantomas wrote:

> Hello,
> 
> seems some big mails were too long to scan, and SA even got killed.
> 
> [2146809.213586] Out of memory: Kill process 3660 (spamassassin)
> score 365 or sacrifice child [2146809.213613] Killed process 3660
> (spamassassin) total-vm:2960664kB, anon-rss:2921892kB, file-rss:0kB,
> shmem-rss:0kB [2146809.270342] oom_reaper: reaped process 3660
> (spamassassin), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> 
> I see the mail body contains nearly 20MB uuencoded text (don't ask).
> 
> I found some body rules that contain ".*" instead of a sane
> quantifier:
> 
> 72_active.cf:rawbody            __HAS_HREF      /^[^>].*?<a href=/im
> 72_active.cf:rawbody            __HAS_HREF_ONECASE      /^[^>].*?<(a
> href|A HREF)=/m 72_active.cf:rawbody            __HAS_IMG_SRC
> /^[^>].*?<img src=/im 72_active.cf:rawbody  __HAS_IMG_SRC_DATA
> /^[^>].*?<img src=['"]data/im 72_active.cf:rawbody
> __HAS_IMG_SRC_ONECASE   /^[^>].*?<(img src|IMG SRC)=/m
> 
> There are different checks that have the "*" quantifier tho.
> Is it reasonable to replace them with {0,1000} globally?


In rawbody rules the text is broken into chunks of 1024 to 2048 bytes,
so the worst case isn't all that much worst than with {0,1000}.

Also  /m means that .* wont cross a line boundary in the decoded text
and  ^ can match in the middle of the chunk. This make the average
processing  time less sensitive to any upper limit on .*.

Reply via email to