http://bugzilla.spamassassin.org/show_bug.cgi?id=3331
------- Additional Comments From [EMAIL PROTECTED] 2004-08-14 14:46 ------- I agree that the 3 functions approach sounds good; having 1 API that receives different combinations of parameters depending on when it's called is a bad API design, in my opinion. (explanation: it's effectively 3 different APIs with different results and args depending on what method it's called from. By mixing them into one plugin API, you move the work of ensuring the right code is run, from the API-design level to the user-implementation-code level; in terms of Rusty Russell's "interface simplicity spectrum" that's from level 5 to level 7. http://sourcefrog.net/weblog/software/aesthetics/interface-levels.html http://www.ozlabs.com/~rusty/ols-2003-keynote/ols-keynote-2003.html Plus, there could be cases where users don't *want* to know about scan operation token use, just learn-op token use. This allows them to override just one of the APIs and ignore the other.) Also: regarding 'a single "get the tokens" method' -- either the plugin implementor would have to track the state, or Bayes.pm would have to track the same state. I think better let the plugin implementor do it. However, here's a good way to avoid making it too hard -- add these APIs: - bayes_start_message($isspam, $generatedmsgid) - called when a single message is about to be learned - bayes_scan_tokens($toksref, $msgatime) - pass the list of tokens from that message, for a scan op (do we have $msgatime here?) - bayes_learn_tokens($toksref, $msgatime) - pass the list of tokens from that message, for a learn op - bayes_forget_tokens($toksref) - pass the list of tokens from that message, for a forget op ($msgatime is irrelevant) - bayes_finish_message() - called when that single message learning op has been completed The bayes_start_message() and bayes_finish_message() APIs allow the plugin to know when a new message is being operated on. So the call order would be for example: $plugin->bayes_start_message(...); $plugin->learn_tokens(...); # dump the tokens $plugin->bayes_finish_message(...); $plugin->bayes_start_message(...); $plugin->forget_tokens(...); # msg was previously learned # dump the tokens $plugin->learn_tokens(...); # we've already dumped 'em, not again $plugin->bayes_finish_message(...); So an implementor would just have to keep a single boolean that's set to 1 during a foo_tokens() call, set to 0 during bayes_start_message() -- then if foo_tokens() find it already set to 1 it doesn't dump the tokens because that's already been done once on that msg. Theo: dump plugin APIs would be useful, good point -- maybe let dump_data() rewrite the line arbitrarily. Suggest something like this API: $line = dump_data({ origline => ..., prob => ..., ts => ..., th => ..., atime => ..., encoded_tok => ... }) (in other words, give it the *unstringified* data as well so that it doesn't have to parse them out of the line.) And then that's probably enough APIs for now ;) ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
