On Wed, Dec 11, 2019 at 01:12:46PM +0100, Matus UHLAR - fantomas wrote: > >On Wed, Dec 11, 2019 at 10:53:04AM +0100, Matus UHLAR - fantomas wrote: > >>On 11.12.19 11:43, Henrik K wrote: > >>>Wow 6 million tokens.. :-) > >>> > >>>I assume the big uuencoded blob content-type is text/* since it's > >>>tokenized? > > >>yes, I mentioned that in previous mails. ~15M file, uuencoded in ~20M mail. > >> > >>grep -c '^M' spamassassin-memory-error-<...> > >>329312 > >> > >>One of former mails mentioned that 20M mail should use ~700M of RAM. 6M > >>tokens eating about 4G of RAM means ~750B per token, is that fine? > > On 11.12.19 12:07, Henrik K wrote: > >I'm pretty sure the Bayes code does many dumb things with the tokens > >that result in much memory usage for abnormal cases like this. > > but apparently nobody notices...
How many people even scan 20MB mails? Pretty much nobody. It's not safe to do until SA 3.4.3 version as you can see. Before this, I know atleast Amavisd-new could be configured to truncate large messages before feeding to SA, which was somewhat safe to do. > >>>This will be mitigated in 3.4.3, since it will only use max 50k of the body > >>>text (body_part_scan_size). > > >>will it prefer test parts and try to avoid uuencoded or base64 parts? > >>(or maybe decode them?) > > >There is no change in how parts are processed. As before, "body" is > >concatenated result of all textual parts. But in 3.4.3 atleast each part is > >truncated to 50k. If there are several parts then it's 50+50k etc.. > > I understand such change apparently should not be done in minor version. It was decided to implement in 3.4.3 to fix things just like this, along with the major CVE fixes. Most likely people will use 3.4.3 until eternity. I don't know when 4.0 will be released and it will be surely adopted very late by distributions.