On Mon, 2014-10-06 at 09:03 -0400, jdime abuse wrote:
> I have been seeing some issues with bayes detection from base64
> strings within attachments causing false positives.
> 
> Example:
> Oct  6 09:02:14.374 [15869] dbg: bayes: token 'H4f' => 0.999971186828264
> Oct  6 09:02:14.374 [15869] dbg: bayes: token 'wx2' => 0.999968644662127
> Oct  6 09:02:14.374 [15869] dbg: bayes: token 'z4f' => 0.999968502147581
> Oct  6 09:02:14.378 [15869] dbg: bayes: token '0vf' => 0.999966604823748
> 
> Is there a solution to prevent triggering bayes from the base64 data
> in an attachment? It was my impression that attachments should not
> trigger bayes data, but it seems that it is parsing it as text rather
> than an attachment.

Bayes tokens are basically taken from rendered, textual body parts (and
mail headers). Attachments are not tokenized.

Unless the message's MIME-structure is severely broken, these tokens
appear somewhere other than a base64 encoded attachment. Can you provide
a sample uploaded to a pastebin?


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to