On Thu, Aug 12, 2004 at 12:08:22AM -0700, Justin Mason wrote: > > currently in 3.0.0 we don't support "sa-learn --dump" containing readable > token data anymore... there's a patch in > http://bugzilla.spamassassin.org/show_bug.cgi?id=3331 to restore this > capability. However, it slows down bayes scanning and learning > quite a bit recording that data as well. > > What do people think? is this functionality being removed a serious > issue? >
Before commenting I would encourage everyone to keep this in mind, and maybe take a look at the performance spreadsheet attached to the bug: This change is not necessarily a win-win. The bayes storage code is very sensitive to small changes. For 3.0 we decided to hash all token values and only store the hash. This gave us a fairly large boost in disk space savings and speed. The patch in 3331, even in the case where you would not store the original (readable) tokens, causes a 3-4% slowdown across the board. I like the patch, I don't believe it makes the code that much harder to maintain or add undue complexity. I was very apprehensive about removing the ability to view readable token values. However, I haven't found a need or desire to view the token values since putting the code in and I'm no worse off. I have yet to hear a concrete reason for needing to view the values. Michael
