[Bug 3077] spamassassin -d is too damn slow

bugzilla-daemon 26 Feb 2004 05:09:07 -0000

http://bugzilla.spamassassin.org/show_bug.cgi?id=3077






------- Additional Comments From [EMAIL PROTECTED]  2004-02-25 21:08 -------
Subject: RE:  spamassassin -d is too damn slow




> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, February 25, 2004 8:29 PM
[...]
> 
> 
> ------- Additional Comments From [EMAIL PROTECTED]  2004-02-25 
> 20:29 -------
> well...   I don't want yet another tool, and I don't want one 
> tool calling another tool.  there was a ticket 
> open about "spamassassin does too much stuff", which was 
> basically asking that it be split into a 
> different tool per command type.
>

I don't have a strong opinion on this, except that like any user,
I'd like all operations to be speedy. In a one message at a time mode,
'spamassassin -d' is not executed often, and probably its slow
execution time is not a big deal. But when users start operating
on large collections of messages, usually to build a corpus,
the overhead dominates and _is_ noticeable.

Personally, I'd have no problems with leaving 'samassassin -d'
as is, but having easy access to an efficient tool that does
only what it needs to do to remove markup, and to do this
en massae (to an mbox or mail folder). Maybe bits and pieces
of mass-check can be utilized for this. Maybe the resulting
tool ends up in the contrib directory, or even the masses directory?

> imo, we can make things faster if we eliminate module loading 
> until necessary.  For instance, 
> 'spamassassin -d' needs M::SA, which reads in _EVERYTHING_: 
> Bayes, Plugin handlers, ConfLDAP, 
> ConfSQL...  We also need M::SA::Message which uses MsgNode, which 
> uses MIME::Base64, MIME::
> QuotedPrintable, M::SA::HTML (which uses HTML::Parser, etc.) ...
> 

Some tests would have to be run, but a case can likely be made
that only loading modules as required would improve overall performance.

> Essentially, we snow ball big time.  So why not yet another tool? 
>  Because we want to avoid code 
> replication, which means that code will call 
> M::SA::remove_spamassassin_markup()...  Which loads all 
> the stuff above.  So there'd be no gain there anyway.

Objectively, remove_spamassassin_markup() needs only access
to the mail headers, certin config. parameters (the rewrite tags),
and certain parts of the body (the top-level attachments).
Of course, loading the config. vars might require the various
database modules if the config. data is recorded in a database
for example.

But there's still a missing capability in all this: the ability
to sequence over an entire mailbox and/or directory. If the tool
can do that, then the overhead of loading the modules will be
likely lost in the noise.

Thus, adding an --mbox or --malidir switch might fix the
'spamassassin -d' performance problem ....





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 3077] spamassassin -d is too damn slow

Reply via email to