On Mon, 2014-10-06 at 09:03 -0400, jdime abuse wrote: > I have been seeing some issues with bayes detection from base64 > strings within attachments causing false positives. > > Example: > Oct 6 09:02:14.374 [15869] dbg: bayes: token 'H4f' => 0.999971186828264 > Oct 6 09:02:14.374 [15869] dbg: bayes: token 'wx2' => 0.999968644662127 > Oct 6 09:02:14.374 [15869] dbg: bayes: token 'z4f' => 0.999968502147581 > Oct 6 09:02:14.378 [15869] dbg: bayes: token '0vf' => 0.999966604823748 > > Is there a solution to prevent triggering bayes from the base64 data > in an attachment? It was my impression that attachments should not > trigger bayes data, but it seems that it is parsing it as text rather > than an attachment.
Bayes tokens are basically taken from rendered, textual body parts (and mail headers). Attachments are not tokenized. Unless the message's MIME-structure is severely broken, these tokens appear somewhere other than a base64 encoded attachment. Can you provide a sample uploaded to a pastebin? -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}