On Thu, Aug 12, 2004 at 12:08:22AM -0700, Justin Mason wrote:
> 
> currently in 3.0.0 we don't support "sa-learn --dump" containing readable
> token data anymore... there's a patch in
> http://bugzilla.spamassassin.org/show_bug.cgi?id=3331 to restore this
> capability.  However, it slows down bayes scanning and learning
> quite a bit recording that data as well.
> 
> What do people think?  is this functionality being removed a serious
> issue?
> 

Before commenting I would encourage everyone to keep this in mind, and
maybe take a look at the performance spreadsheet attached to the bug:
This change is not necessarily a win-win.

The bayes storage code is very sensitive to small changes.  For 3.0 we
decided to hash all token values and only store the hash.  This gave
us a fairly large boost in disk space savings and speed.  The patch in
3331, even in the case where you would not store the original
(readable) tokens, causes a 3-4% slowdown across the board.

I like the patch, I don't believe it makes the code that much harder
to maintain or add undue complexity.

I was very apprehensive about removing the ability to view readable
token values. However, I haven't found a need or desire to view the
token values since putting the code in and I'm no worse off.

I have yet to hear a concrete reason for needing to view the values.

Michael

Reply via email to