After reading your reply, I re-examined the message and found the case was
an incorrect Content-Type:
~~~
Content-Type: text/plain; charset=windows-1250;
 name="pdfname.pdf"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="pdfname.pdf"
~~~

So it was scanning the base64 as text and tokenizing it.

On Mon, Oct 6, 2014 at 3:28 PM, Karsten Bräckelmann <guent...@rudersport.de>
wrote:

> On Mon, 2014-10-06 at 09:03 -0400, jdime abuse wrote:
> > I have been seeing some issues with bayes detection from base64
> > strings within attachments causing false positives.
> >
> > Example:
> > Oct  6 09:02:14.374 [15869] dbg: bayes: token 'H4f' => 0.999971186828264
> > Oct  6 09:02:14.374 [15869] dbg: bayes: token 'wx2' => 0.999968644662127
> > Oct  6 09:02:14.374 [15869] dbg: bayes: token 'z4f' => 0.999968502147581
> > Oct  6 09:02:14.378 [15869] dbg: bayes: token '0vf' => 0.999966604823748
> >
> > Is there a solution to prevent triggering bayes from the base64 data
> > in an attachment? It was my impression that attachments should not
> > trigger bayes data, but it seems that it is parsing it as text rather
> > than an attachment.
>
> Bayes tokens are basically taken from rendered, textual body parts (and
> mail headers). Attachments are not tokenized.
>
> Unless the message's MIME-structure is severely broken, these tokens
> appear somewhere other than a base64 encoded attachment. Can you provide
> a sample uploaded to a pastebin?
>
>
> --
> char *t="\10pse\0r\0dtu\0.@ghno
> \x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
> main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8?
> c<<=1:
> (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0;
> }}}
>
>

Reply via email to