http://bugzilla.spamassassin.org/show_bug.cgi?id=2266





------- Additional Comments From [EMAIL PROTECTED]  2004-04-28 07:50 -------
Subject: Re:  New features: Tokens in report, status of Bayesian classification.

On Wed, Apr 28, 2004 at 07:27:14AM -0700, [EMAIL PROTECTED] wrote:
> 
> And so instead of
>       my %tokens = map { substr(sha1($_), -5) => 1 } grep(length, @tokens);
> do
>       my %tokens = map { substr(sha1($_), -5) => $_ } grep(length, @tokens);
> 
> and then either return the hash table (requiring changes to callers of
> tokenize) or else store it somewhere.  The code in scan can then use
> the hash table to retrieve the original text for each token to be
> displayed.
> 

This is pretty  much my thinking, except something like this:
my %tokens = map { substr(sha1($_), -5) => {'orig' => $_} } grep(length, 
@tokens);

Then I can add on to that in scan with the counts, atime and prob.
Then use it later in building up the hammy/spammy arrays.

Michael





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to