I have a development idea.  How about the tokens db storing not only the hash 
and frequency, but also the actual plaintext string.  The string would only be 
used for database dumps and reports, while the hash would be used for the 
actual matching and scoring.
I think this would give the best of both worlds, the only potential issue being 
privacy.  Given that words aren't associated with user accounts or messages in 
the DB, I don't really see any merit to the privacy argument.

JP

Matt Kettler wrote ..
> At 04:42 PM 10/29/2004, [EMAIL PROTECTED] wrote:
> >Thanks for the responses.  Good explanations that make perfect sense.
> >SO.. now that I'm past the hex-in-db issue, I clearly do have some issue
> >nonetheless.  The following spam got through with a score of -4.3, 
> >seemingly because of the AWL.  My AWL, however is empty per 
> >tools/check_whitelist.  How could this have happened:
> 
> 1) I don't see the AWL being generated by YOUR version of SA.. I see it
> being generated by someone who is using a DIFFERENT version of SA...
> 
> >         X-Spam-Checker-Version: SpamAssassin 8.2-spambr_6119620U on 
> > tradeexperts.com
> >         X-Spam-Level:
> >         X-Spam-Status: No, hits=-4.3 required=3.0 tests=AWL,NO_REAL_NAME
> > autolearn=no version=4.8-spambr_398464947C
> 
> That's not you.. you're not running a "spambr" variant of SA.
> 
> I'd double-check and make sure you're not doing something like bypassing
> all mail that has an X-Spam-Status header.. that's a sure-fire way to be
> abused by spammers as above.
> 
> 2) YOUR system is generating these hits (from your debug output)
> 
> >debug: 
> >tests=ALL_TRUSTED,BAYES_60,MISSING_HEADERS,MISSING_SUBJECT,NO_REAL_NAME
> 
> YOUR problem seems to be the hit of ALL_TRUSTED.
> 
> >debug: metadata: X-Spam-Relays-Trusted:
> >debug: metadata: X-Spam-Relays-Untrusted:
> 
> That looks like SA being unable to parse any of the Received: headers in
> the message.. Not so good.

Reply via email to