After reading your reply, I re-examined the message and found the case was an incorrect Content-Type: ~~~ Content-Type: text/plain; charset=windows-1250; name="pdfname.pdf" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="pdfname.pdf" ~~~
So it was scanning the base64 as text and tokenizing it. On Mon, Oct 6, 2014 at 3:28 PM, Karsten Bräckelmann <guent...@rudersport.de> wrote: > On Mon, 2014-10-06 at 09:03 -0400, jdime abuse wrote: > > I have been seeing some issues with bayes detection from base64 > > strings within attachments causing false positives. > > > > Example: > > Oct 6 09:02:14.374 [15869] dbg: bayes: token 'H4f' => 0.999971186828264 > > Oct 6 09:02:14.374 [15869] dbg: bayes: token 'wx2' => 0.999968644662127 > > Oct 6 09:02:14.374 [15869] dbg: bayes: token 'z4f' => 0.999968502147581 > > Oct 6 09:02:14.378 [15869] dbg: bayes: token '0vf' => 0.999966604823748 > > > > Is there a solution to prevent triggering bayes from the base64 data > > in an attachment? It was my impression that attachments should not > > trigger bayes data, but it seems that it is parsing it as text rather > > than an attachment. > > Bayes tokens are basically taken from rendered, textual body parts (and > mail headers). Attachments are not tokenized. > > Unless the message's MIME-structure is severely broken, these tokens > appear somewhere other than a base64 encoded attachment. Can you provide > a sample uploaded to a pastebin? > > > -- > char *t="\10pse\0r\0dtu\0.@ghno > \x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; > main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? > c<<=1: > (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; > }}} > >