http://bugzilla.spamassassin.org/show_bug.cgi?id=3077
------- Additional Comments From [EMAIL PROTECTED] 2004-02-25 21:08 ------- Subject: RE: spamassassin -d is too damn slow > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] > Sent: Wednesday, February 25, 2004 8:29 PM [...] > > > ------- Additional Comments From [EMAIL PROTECTED] 2004-02-25 > 20:29 ------- > well... I don't want yet another tool, and I don't want one > tool calling another tool. there was a ticket > open about "spamassassin does too much stuff", which was > basically asking that it be split into a > different tool per command type. > I don't have a strong opinion on this, except that like any user, I'd like all operations to be speedy. In a one message at a time mode, 'spamassassin -d' is not executed often, and probably its slow execution time is not a big deal. But when users start operating on large collections of messages, usually to build a corpus, the overhead dominates and _is_ noticeable. Personally, I'd have no problems with leaving 'samassassin -d' as is, but having easy access to an efficient tool that does only what it needs to do to remove markup, and to do this en massae (to an mbox or mail folder). Maybe bits and pieces of mass-check can be utilized for this. Maybe the resulting tool ends up in the contrib directory, or even the masses directory? > imo, we can make things faster if we eliminate module loading > until necessary. For instance, > 'spamassassin -d' needs M::SA, which reads in _EVERYTHING_: > Bayes, Plugin handlers, ConfLDAP, > ConfSQL... We also need M::SA::Message which uses MsgNode, which > uses MIME::Base64, MIME:: > QuotedPrintable, M::SA::HTML (which uses HTML::Parser, etc.) ... > Some tests would have to be run, but a case can likely be made that only loading modules as required would improve overall performance. > Essentially, we snow ball big time. So why not yet another tool? > Because we want to avoid code > replication, which means that code will call > M::SA::remove_spamassassin_markup()... Which loads all > the stuff above. So there'd be no gain there anyway. Objectively, remove_spamassassin_markup() needs only access to the mail headers, certin config. parameters (the rewrite tags), and certain parts of the body (the top-level attachments). Of course, loading the config. vars might require the various database modules if the config. data is recorded in a database for example. But there's still a missing capability in all this: the ability to sequence over an entire mailbox and/or directory. If the tool can do that, then the overhead of loading the modules will be likely lost in the noise. Thus, adding an --mbox or --malidir switch might fix the 'spamassassin -d' performance problem .... ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
