On Thu, 5 Dec 2019 17:07:05 +0100 Matus UHLAR - fantomas wrote: > Hello, > > seems some big mails were too long to scan, and SA even got killed. > > [2146809.213586] Out of memory: Kill process 3660 (spamassassin) > score 365 or sacrifice child [2146809.213613] Killed process 3660 > (spamassassin) total-vm:2960664kB, anon-rss:2921892kB, file-rss:0kB, > shmem-rss:0kB [2146809.270342] oom_reaper: reaped process 3660 > (spamassassin), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > > I see the mail body contains nearly 20MB uuencoded text (don't ask). > > I found some body rules that contain ".*" instead of a sane > quantifier: > > 72_active.cf:rawbody __HAS_HREF /^[^>].*?<a href=/im > 72_active.cf:rawbody __HAS_HREF_ONECASE /^[^>].*?<(a > href|A HREF)=/m 72_active.cf:rawbody __HAS_IMG_SRC > /^[^>].*?<img src=/im 72_active.cf:rawbody __HAS_IMG_SRC_DATA > /^[^>].*?<img src=['"]data/im 72_active.cf:rawbody > __HAS_IMG_SRC_ONECASE /^[^>].*?<(img src|IMG SRC)=/m > > There are different checks that have the "*" quantifier tho. > Is it reasonable to replace them with {0,1000} globally?
In rawbody rules the text is broken into chunks of 1024 to 2048 bytes, so the worst case isn't all that much worst than with {0,1000}. Also /m means that .* wont cross a line boundary in the decoded text and ^ can match in the middle of the chunk. This make the average processing time less sensitive to any upper limit on .*.