Re: bayes_ignore_header policy?

2022-05-10 Thread Loren Wilton
On 2022-05-10 06:44, Henrik K wrote: On Sun, May 08, 2022 at 11:29:29AM -0400, Bill Cole wrote: I have not researched all of those, but I believe that some of those should in theory be useful in Bayes. So is someone going to research them then? And the 268 older headers that axb already

Re: File::Spec?

2022-04-29 Thread Loren Wilton
Windows NT (that is, any kind of current Windows) file functions will natively accept either \ or / equivalently in file paths. There is an option to disable the acceptance of /, but almost nobody knows it exists, and I can't imagine anyone setting it on a rational system. The main

Re: spamassassin 3.4.5 wide chars

2021-09-05 Thread Loren Wilton
I don't recall your problem, but note that 3.4.6 was a very hasty update to 3.4.5 to correct some problems that showed up with some rules a day or so after 3.4.5 was created. If things worked before 3.4.5 and fail in rules wiht 3.4.5, I'd suggest that 3.4.6 may be the correct solution.

Re: [Bug 7908] Domain PRO is treated as spam

2021-05-12 Thread Loren Wilton
...yer preachin' t'th'choir here, Loren. :) Yea, sorry about that. I realized that about 3 minutes after I hit send, but didn't want to pollute the mailing list by re-posting it as a bug comment. You did a much more diplomatic job anyway. :-) Loren

Re: [Bug 7908] Domain PRO is treated as spam

2021-05-12 Thread Loren Wilton
Let me see if I can explain this problem in simple words: 1) The SA project develops rules. 2) From time to time (almost daily) the SA project updates those rules. 3) The problem you are reporting has been FIXED LONG AGO in the rules in the SA project. 4) Administrators at various mail sites

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

2021-05-08 Thread Loren Wilton
An alternative approach is creating new strings from parsed data: string TO_BODY = TO:addr ":" BODY(500) string TO_BODY ~= // the advantage of this is that there are no dependencies. I'm thinking that BODY(500) would be a multi-line string constructed from the first 500 byte of the rendered

Re: [Bug 4549] un-run dependencies should not trigger meta rules

2021-05-08 Thread Loren Wilton
And if the meta is depending on multiple unfinished rules, or even other metas with unfinished rules? Sounds like a logic nightmare.. better just design the metas better in the first place.. Seems to me the logic would be moderately straight-forward if it was driven out of the end of net

Re: [Bug 4549] un-run dependencies should not trigger meta rules

2021-05-08 Thread Loren Wilton
Henrik (or anyone) what happens if a net rule is fired but no response is ever received? I assume there is some timeout limit for net responses to show up, so that message processing can complete? If that is true (there is a timeout on responses) then the situation in bug 4549 still exists

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

2021-05-07 Thread Loren Wilton
I guess the risk is exactly the same as rulenames colliding.. better not use very generic names and you can always prepend the rulename yourself. :-) My other concern is thta as far as I know, SA rules are still limited to a single line of text. If the rule name plus item name gets long, the

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

2021-05-07 Thread Loren Wilton
Perl already has named capture groups as legit syntax, so it would be most simple to actually use them. https://perldoc.perl.org/perlre#(?%3CNAME%3Epattern) header FROM_NAME /^From: "(?\w+)/ Good. I thought there was someting there, but I didn't remember the exact syntax and was too lazy to

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

2021-05-07 Thread Loren Wilton
> header __SUB_CAP Subject:Capture /Your (\w+) Order/i $(__COMPANY)=\1 Would :capture play well with (e.g.) :addr, :name, :raw, etc? It might as well be a tflag or something. Why limit capturing to headers only? I hadn't intended it to be limited to headers only, but I guess the syntax

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

2021-05-02 Thread Loren Wilton
Now consider variable capture from the message: header __SUB_CAPSubject:Capture/Your (\w+) Order/i $(__COMPANY)=\1 The text above was intended to all appear on one line. "$(__COMPANY)=\1" followed /i.

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

2021-05-02 Thread Loren Wilton
John Hardin wrote: An awful lot I think could be done simply by having rules that can capture to named per-message-global variables, and allowing those variables to be used in other (or the same) rules. I've been wanting this for years. Proposal for discussion: Consider the following

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

2021-05-01 Thread Loren Wilton
Ideally rules could be written with some pseudo-language that could do complex things, grabbing things into variables, modifying, comparing to other things etc. Then there wouldn't be any need for Perl plugins doing some trivial stuff. An awful lot I think could be done simply by having rules

Re: svn commit: r1889364 - /spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/Esp.pm

2021-05-01 Thread Loren Wilton
These kinds of changes just make you wonder what's the point of doing such plugins inside SA distribution.. if we ever do get 4.0 released, I really doubt if there are enough resources in the project to even release monthly updates after that.. Given that plugins are by and large the basis for

Re: Rules missing from ruleqa results?

2020-09-16 Thread Loren Wilton
All: There are several rules in my sandbox that do not appear in the results at https://ruleqa.spamassassin.org/ (e.g. __RECEIVE_BONUS) I don't see __RECEIVE_BONUS in the bad sandbox report. I would assume it is in your lotsa_money file? All I can think of is something obvious, like it being

Re: Zero-point garbage text that isn't caught by the small-font rules

2020-09-12 Thread Loren Wilton
It's properly formed. Compare the plaintext part to the HTML part, note that the base64 block is QP'd base64, and note that there's some more QP spam pitch text after the base64 block. Ah. I completely missed the division boundary a third of the way thru, or for that matter the pdf attachment

Re: Zero-point garbage text that isn't caught by the small-font rules

2020-09-12 Thread Loren Wilton
See attached spample. Is there a boundary missing in that spample? It seems to go from a couple lines of QP text into base64 with no intervening boundary.

Re: Proposed new "alias" directive (was: Rules referencing WHITELIST or BLACKLIST in process of being Renamed)

2020-07-23 Thread Loren Wilton
The "alias" directive should not affect RE rules at all, other than perhaps removing one if the alias is defined after a RE rule having the same name was defined. I would hope another use of ALIAS would be to redirect a subsequent SCORE or DESCRIPTION directive to the renamed rule rather than

Re: Why the new changes need to be "depricated" forever

2020-07-21 Thread Loren Wilton
de on when and how to merge the branch into trunk. Such a vote should be made by the full PMC, and not by a quorum of one member. I think we know how the vote would go if only a single member can carry a vote in a meeting he convenes for the purpose just by himself. Respectfully, Loren Wilt

Re: shortcircuit for USER_IN_WHITELIST -- noautolearn?? ==learn!

2008-05-07 Thread Loren Wilton
Is there a way to clear the noautolearn for the whitelist rules? Normal rules could probably do it with tflags. Except I'm not sure that you can necessarily negate a previously set tflags value with a later tflags value. (If not, maybe it would be worth an enhancement request.) Another

Re: svn commit: r594726 - in /spamassassin/trunk: build/nightlymc/clienthosts masses/rule-qa/nightly-slave-start

2007-11-14 Thread Loren Wilton
Having far more experience than I need on multiproc systems, the answer is it depends. In all probability having the extra 8 threads running will result in some processor speed increase. It will be less than double, so 1.0 x 2.0. Of course if this spawns N more instances of SA, you will

Re: svn commit: r586641 - /spamassassin/trunk/lib/Mail/SpamAssassin/Message/Node.pm

2007-10-20 Thread Loren Wilton
This isn't the only concern. There are performance penalties once you bring the Encode module into play. Please see the discussion some months back when John Myers added the other Encoding stuff. Seems in this case that Encode was *solving* performance penalties. Loren

Re: move ClamAVPlugin into the core distro?

2007-10-11 Thread Loren Wilton
Honestly I'm -0.5 on this. SA isn't a virus scanner, and while it could The magic key that to my mind makes bringing it into the core set isn't virus, its phish. Agreed, SA isn't a virus scanner and probably shouldn't be; it is quite inefficient at that sort of thing. But to the best of

Re: Easier Rules Work

2007-08-18 Thread Loren Wilton
You want to set up a mass checker. This will run on mbox files (I'm pretty sure) and it gives you combined and very useful stats on an entire group of rules you are testing, across all the mail in the files. You basically need a group of known ham mail and another group of known spam mail.

Re: [Bug 5590] Scantime is very long unless use bytes hack is used

2007-08-11 Thread Loren Wilton
Good catch, mine is UTF-8.. Not sure about the original reporter. Probably not, or they wouldn't be seeing the large difference they see. Loren

Re: add a new rule type: single-line body?

2007-07-28 Thread Loren Wilton
Just wondering. would it be handy to have a new body type, the same as body but matched as a single string, with all newlines converted to ? in other words, this text: It might be beneficial to convert the newlines to spaces, but it might also be beneficial to leave them there so that they

Re: [Bug 5433] New: Need a workaround for Earthlink routing stupidity

2007-04-25 Thread Loren Wilton
I think Chris was talking about the same behavior yesterday... his message had the following headers in it... Chris a few weeks back, before trying to get away from Earthlink, was complianing about exactly the same 'nohelo' hop in the Earthlink routing chain. I haven't followed his recent

Re: a way to set scores for new rules in updates

2007-02-07 Thread Loren Wilton
Bob had a technique that worked reasonably well, and only required some minor human thought to usually come up with good numbers. I'm pretty sure that it took overlap into account. (Although the overlap runs may have been something he did manually, I don't recall.) He seems to be gone to

Re: Score Test?

2006-12-20 Thread Loren Wilton
A) make a specific rule or rule set test at (or near) the end of the tests With some recent version of SA (3.2? Don't recall.) you can set a priority on the rule and have it run as one of the last rules. Come to think of it, you aren't talking about short-circuiting, just having it run

Re: plugins in sa-update

2006-07-01 Thread Loren Wilton
required. I'm not 100% definite though. let's see if anyone else weighs in ;) As far as I'm concerned rulz is rulz. If it is a rule that requires new code to work, then the new code better in some way come with the new rule. Otherwise there is no point in distributing the (unworkable) rule,

Re: Quick Q about RE coding style

2006-06-29 Thread Loren Wilton
I'd personally perfer the second form. Every time I see something like the first for I always wonder if it was deliberate or someone's fingers slipped and entered an extra character. Or they took out an option and missed deleting the pipe. Loren

Re: [Bug 3109] RFE: really simple this is ham shortcircuiting

2006-04-12 Thread Loren Wilton
Someone please remind me why the email score simply isn't the score total up to and including the short-circuit rule? I'm assuming that in general short circuit rules are going to run relatively early (else why bother?) so except for a short-circuit meta there should be relatively few scores in

Re: [Bug 4853] New: SA not matching *any* spam

2006-04-02 Thread Loren Wilton
Incidentally I'm using spamass-milter to pipe mail via milter to sa. Spamass-militer is known to have problems with 3.1.1. Look at some of the mail comping through and see if you are getting headers leaking down into the body of the messages.

Re: Weird comment in DomainKeys plugin

2006-03-24 Thread Loren Wilton
Total guess: that comment was left over from domainkeys, from before the SA headers were moved up to the top. Loren

Re: move full rule functionality into a default-off plugin

2006-03-09 Thread Loren Wilton
Just for history sake, the reason we made a MIMEHeader plugin in the first place (included in 3.1) was because it was asked for in bug 3781 by Loren. So I'm kind of surprised that it wasn't being used already. Ah. I think we may have missed that it came into existance. Is this disabled by

Re: move full rule functionality into a default-off plugin

2006-03-08 Thread Loren Wilton
of Bayes and URIBL. There would probably be a much lower-overhead solution, say SpamBayes, if SA's rules capability is effectively removed. Which seems to be the effective intent of this proposal. Loren Wilton

Re: moving uribl to rules dir (Re: svn commit: r383618 - in /spamassassin: rules/trunk/core/25_replace.cf rules/trunk/core/60_whitelist_spf.cf trunk/rules/25_replace.cf trunk/rules/60_whitelist_spf.cf

2006-03-08 Thread Loren Wilton
If it's a plugin, it has to be a code-tied rule! Otherwise it wouldn't need the plugin. Hey, what a neat way to completely disable the initial concept of the Rules project and put things back into the Land Of Arcana where they belong! Just move 'body', 'rawbody', 'header', and 'full' to

Re: move full rule functionality into a default-off plugin

2006-03-08 Thread Loren Wilton
It does introduce the danger of algorithmic complexity attacks if .* is used instead of .{0,20} though -- but we may be able to help this if we spot that kind of thing in --lint. I still don't understand why .* is more dangerous in rawbody rules than it is in full rules. Any cases where it

Re: move full rule functionality into a default-off plugin

2006-03-08 Thread Loren Wilton
my $text = $parts[$pt]-decode(); $text =~ tr/ \t\n\r\x0b\xa0/ /s;# whitespace = space push(@{$self-{text_decoded}}, split_into_array_of_short_lines($text)); What does split_into_array_of_short_lines do? This sounds to me like it still ends up with individual lines fed to the

Re: tvd-evaltoplugin

2006-02-25 Thread Loren Wilton
One big plugin would be better than the current split. The current split has no solid technical rationale behind it. - allows eval rules to not be loaded. arguably, most of them will always be enabled, but some could be disabled. DNSEval, for instance, is only useful in net mode. If

Re: Rule Timeouts (was users@ Re: Two mails completely blocking SA 3.1.0 !)

2006-02-14 Thread Loren Wilton
Should we be wrapping full rules in alarms (using M::SA::Timeout) to prevent this? You can do this with any rule, a full rule is just easier to mess up. I'd be concerned of the overhead (and probable timing holes) in wrapping every rule in an alarm(). As an alternative, how about wrappring

Re: local_state_dir stuff in 3.2 breaks eval rules ...?

2006-02-10 Thread Loren Wilton
default_rules_path (/usr/share/spamassassin) site_rules_path (/etc/mail/spamassassin) default_userprefs_path (~/.spamassassin/user_prefs) Doesn't that imply that site rules override local rules? Surely those are in the other order? Or is there magic when reading the second file

Re: [Bug 4766] New: remove SUBJ_HAS_UNIQ_ID and triplets.txt code

2006-01-21 Thread Loren Wilton
in other words it's been dropping from a high of 19.348% of spam to just 0.38% nowadays. Which isn't to say that there aren't unique ids in modern subjects. They just aren't in a form this can detect. :-) Loren

Re: Charset normalization issue (report, patch, and request)

2006-01-14 Thread Loren Wilton
As an outsider, I find myself strongly agreeing with Motohraru-san that, when dealing with at least the oriental multibyte languages, tokinization belongs early in the stream, before both bayes and rules. Of course this is an overhead penalty that should not occur on mail that isn't likely to be

Re: Security-related bugs

2006-01-11 Thread Loren Wilton
IMO, bugs which allow any specially crafted spammy message to get through, even if the method used is to crash spamd or stand-alone SA, is NOT a security bug, provided the only damage is to SA/spamd and the resulting FN. That's a bug, pure and simple, no matter how creative the spammer is.

Re: What's up with these URLs?

2006-01-11 Thread Loren Wilton
At a guess: IE and apparently Firefox have search for url enabled by default. In IE that consists of sticking .com, .net, etc suffixes on, and I think trying a www. prefix. From a report on the user's list, it appears that Firefox goes farther and will do a google search, resulting in a tinyurl

Re: tvd-evaltoplugin

2006-01-06 Thread Loren Wilton
You can do that with the plain regex rules thanks to the experimental and rather loony (?{...}) and (??{...}) constructs. Well no. You could do that on 2.6x, and I used that for some very valuable rule development tools. That ability was removed in 3.x. Loren

Re: DATE_IN_ tests

2006-01-06 Thread Loren Wilton
anyway, I've just checked in a change that'll allow hit-rates all the way down to 0.02%. why not. ;) I guess I question active hitrates much under 1%. The key there is 'active'. Things that may be hitting next to nothing in one corpus might be hitting well in another one. Loren

Re: promoting spamtraps

2006-01-01 Thread Loren Wilton
Whether your idea is good or not, it has to do with a suggestion for how to use sa-learn, not anything to do with development. Hi Sidney, happy new year! Actually, while he phrased the RFE in terms of sa-learn, it is actually something that could be done as an SA plugin, if SA were run on the

Re: tvd-evaltoplugin

2005-12-30 Thread Loren Wilton
Converting sections of tests into plugins where some people will want to disable the entire set due to performance, memory, or similar constraints (i.e., Bayes tests, network tests, special functionality, etc.) does make sense. However, converting individual (or nearly individual) tests that

Re: 3.0.5 rescoring

2005-11-21 Thread Loren Wilton
Hello Warren, There was also a recent discussion about using SVM scoring techniques, and someone posted a tool to do that. I believe the claim was that it produced reasonable scoring with less effort than the normal method. Perhaps that could be used here? Loren

Re: rule promotion criteria

2005-11-12 Thread Loren Wilton
Looks generally good. Minor comments: 1. Bob had a thing built into his version of mass-check that assigns default scores. I'm not clear on the basis for this (although he has explained it any number of times) but it is fairly simple and seems todo a decent job, shy of a full scoring run. I'm

Re: [Bug 4679] Bayes is undocumented in README, USAGE, INSTALL and UPGRADE

2005-11-10 Thread Loren Wilton
'As a collaborative documentation platform, the wiki has already proved much more effective than our SVN codebase.' So why not write a routine to scrape the Wiki on the day of release and stick the pages into files in the release tree? Loren

Re: [Bug 4594] spamd dies unexpectedly: prefork: ordered child to accept, but child reported state '1'

2005-11-07 Thread Loren Wilton
Not in my case Tom. I actually have all the Bayes features disabled and the error still happened on my installation. But do you have AWL disabled too?

Re: Nightly runs still not working right

2005-11-07 Thread Loren Wilton
I suppose mkrules could be changed to cat all the files parsed so far, so that a sandbox file can refer to a core file's rule by name (since sandbox will be compiled after core); but I quite like the side-effect of restricting sandbox files to only being able to affect rules in their own

Re: hit-rate-over-time graphs

2005-10-28 Thread Loren Wilton
Hum. Is there any way to configure some default colors for the graph? On a PC it seems Quicktime prints the thing out, and it is near unreadable. I see a black square with a straight yellow line in the center and some wiggly lines near the bottom. I *think* there might be some text in the

Re: Bugzilla has moved!

2005-10-19 Thread Loren Wilton
Now that I can log is, I see why it isn't really important. Loren

Re: How to use sandboxes?

2005-10-18 Thread Loren Wilton
Some random comments: So the idea is that the source code for all rules (apart from the legacy core and lang sets) remains in the sandbox dirs; in other words, there's no need to cut and paste and move around rules when they're promoted from testing status, to live core status. I'm not

Re: Bugzilla has moved!

2005-10-18 Thread Loren Wilton
Not too important, but the quip software is dumping SQL debug info: Maybe that depends on what you are doing. I tried to log in unsuccessfully: Software error:DBD::mysql::st execute failed: You have an error in your SQL syntax near '' at line 1 [for Statement "SELECT login_name FROM

Re: BZ and rules

2005-10-13 Thread Loren Wilton
You know, I don't know if there'd be a separate bugzilla. good question... I think the mostly likely thing would be that the rules project stuff would be under the (existing) Rules component in BZ. I don't know that BZ would get much use or be of much use in day to day rules testing and

Re: rules project -- a new way to do fast-turnaround mass-checks

2005-10-05 Thread Loren Wilton
Please let me know what you think! Daryl and Chris both make a number of good points, but the buildbot idea also seems to have a good deal of merit. A creative solution for the 'private corpus' problem that Chris mentions might help a lot though. Unfortunately I don't have one at the moment,

Re: [Bug 4442] Lint should warn if user rules found and allow_user_rules not set

2005-09-20 Thread Loren Wilton
Well, user rules are always allowed when 'spamassassin' is run so a --lint message would have to say if you plan on using spamd your user rules won't be used. On the other hand, spamd when called with -Dconfig, will tell you it's not parsing each of your user rules. So... do we really want

Re: Spamassassin for TREC (fwd)

2005-09-08 Thread Loren Wilton
Note also echo score MICROSOFT_EXECUTABLE 4 .spamassassin/user_prefs Isn't that a 2.6x rule that went away in 3.0? I would hope that anything comparing filtering results (as I would guess this to be, knowing nothing of it) would be using a reasonably recent version. (Of course it would

Re: [Bug 4415] Intermittent __alarm__ errors with various plugins

2005-09-08 Thread Loren Wilton
As ancedotal evidence, its my belief that people are seeing _alarm_ log records and associated scan failures on both rc1 and rc2, and that they are occuring with more than just Pyzor. This is anecdotal however, I don't have any evidence to hand to support that. I'm personally wondering if this

Re: Sizing a system for Spam Assassin

2005-09-07 Thread Loren Wilton
Better asked on the user's list, where there are people running systems like that. Loren

Re: initial rules organization ideas

2005-08-23 Thread Loren Wilton
Justin writes: I think we don't even need to do that; once we get the search directories recursively code worked out for configuration and rules, plugins will be loadable from *any* directory in the rules project: ROOT/rules/group/20_name_of_file.cf

Re: initial rules organization ideas

2005-08-23 Thread Loren Wilton
I *think* what Daniel was thinking of here, which should work, is just using the ifversion commands to conditionalize too-advanced rules. Assuming ifversion can be used in the negative also. For instance, we have one set of meta rules that use addition post-whatever, and do a less-good job

Re: IPv6 in DNS-RBL. First glitch found.

2005-08-22 Thread Loren Wilton
Just looking from the sidelines, it seems the obvious answer would be to add a new namespace to the blacklist. eg: *.2.1.9.ipv6.rbl.example.org. instead of *.2.1.9.rbl.example.org. Since this is for numeric lookups, and alpha or alphanum tag in what would be the high octet of the ipv4 dotted

Re: [Bug 4547] Spamassassin not checking messages

2005-08-19 Thread Loren Wilton
How big are they? SA is set up to bypass messages over a given size.

Re: Preliminary design proposal for charset normalization support in SpamAssassin

2005-08-19 Thread Loren Wilton
The following functions, immediately after they all Mail::SpamAssassin::Message::Node::decode, need to call a function that does charset normalization. * Mail::SpamAssassin::Message::get_rendered_body_text_array * Mail::SpamAssassin::Message::get_visible_rendered_body_text_array *

Re: initial rules organization ideas

2005-08-18 Thread Loren Wilton
Agree in general, but possibly... 2. code-tied rules stay with main tree in current rules directory with the exception of 25_replace.cf which is really just another way to write body/header rules (basically, the static stuff that is tied to code does not move to the rules project)

Re: problems detecting URIs embedded in JIS encoding

2005-08-09 Thread Loren Wilton
Could you please point this thread at the two bug numbers? I'd like to target these for a future 3.0.5 bug-fix release, because we are very unlikely able to upgrade our Enterprise distro to 3.1 in the short to medium term. (I am hoping in the long term to have both RHEL4 and RHEL5 on

Re: problems detecting URIs embedded in JIS encoding

2005-08-08 Thread Loren Wilton
This is quite similar to two recent bugs that caused similar problems if certain ascii characters immediately followed the URI. Spammers had exploited at least one of those cases. I don't know what the fix was for those bugs, but it may have been similar to the change you propose. Loren

Re: [Bug 4513] New: outgoing mail

2005-08-03 Thread Loren Wilton
You need to ask this question on the users list. This list is to discuss spamassassin development.

Re: [Bug 4514] New: Hotmail/dav mail from Outlook Express marked as FORGED_MUA_OUTLOOK

2005-08-03 Thread Loren Wilton
Are you SURE that was a valid message? If so, it will be the first recorded instance of X-Message-Info showing up in ham and not only in spam. Previously that had been a sure sign of a spam tool generated mail.

Re: PROPOSAL: create SpamAssassin Rules Project

2005-08-01 Thread Loren Wilton
naming isn't really much of a big deal but it'd be nice to have some way to keep track of that. (not that I can think of it.) Look at some of the SARE rule files that Bob maintains. He has a formalized set of comments that get stuck to rules, and one of these can/does show the history

Re: PerMsgStatus

2005-07-29 Thread Loren Wilton
a) what the heck are priorities, who sets them, and do they really have any justifiable purpose? Ie: can they just quietly vanish into the night with nobody being any the wiser? They order the rules -- or more correctly, sets of rules. Most rules are priority 500 (iirc), but some need

Thoughts/ramblings on rule short circuiting

2005-07-29 Thread Loren Wilton
I was thinking about the 'best' wat to shortcut running rules when they weren't needed, and suddenly realized there might be cases where it is necessary to run them even though they won't determine the hammyness or spammyness of the mail. In particular, I'm wondering about bayes and awl

Re: Thoughts/ramblings on rule short circuiting

2005-07-29 Thread Loren Wilton
It seems obvious that we want to run that -100 rule first. If it hits, the maximum possible score if *every* other rule hits will be 4, and with a threshold of 5, the mail can't be spam. So we can stop after the -100 rule hits, and only run one rule on this mail. This just brought up an

Re: [Bug 4505] Score generation for SpamAssassin 3.1

2005-07-29 Thread Loren Wilton
+score BAYES_50 0 0 0.845 0.001 # n=1 +score BAYES_60 0 0 2.312 0.372 # n=1 +score BAYES_80 0 0 2.775 2.087 # n=1 +score BAYES_95 0 0 3.023 2.063 # n=1 +score BAYES_99 0 0 2.960 1.886 # n=1 I think the score for BAYES_99 should be hand tweaked, regardless of what the score generator said. This

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-27 Thread Loren Wilton
Example: I am currently writing a very FEW rules, some from scratch and some by adapting the work or ideas of others from such lists or web sites. You have all convinced me that if I post a rule for discussion that it is then close to worthless. It depends on how you post it. And it may

Re: Re[4]: Hackathon summary

2005-07-26 Thread Loren Wilton
How would we determine ham/spam? At this point all we have is SA's first estimation, and no way of knowing whether this is accurate, FN, or FP. All we could reasonably do is take SA's assment of the message and assume that statistically it will be correct to one or two sigma or so. If the

Re: Re[4]: Hackathon summary

2005-07-26 Thread Loren Wilton
More thought ... what if SA systems were to accumulate daily statistics, along the lines of one record for each rule, containing: That sounds like the general sort of vague idea I had, fleshed out in more detail. Certainly the desirable goal is basically: 1 does this rule hit anything? 2 does

Re: Re[2]: Hackathon summary

2005-07-25 Thread Loren Wilton
That's why we use 70_sare_name_eng.cf files, to indicate that these rules work well only on systems which expect almost 100% English ham, and little to no ham in other languages. I've begun to wonder whether it might be worth while having 50_scores.cf for English emails, and then

Re: compiled user rules

2005-07-24 Thread Loren Wilton
it's not a matter of popularity -- it's a matter of being horrendously difficult to support. I grant from what I've seen of PMS that this gets pretty ugly. Or at least it seems to to me, but then a lot of apparently good Perl looks pretty ugly to me. ;-) But I'm a C++ and Algol programmer,

Re: [Bug 4497] New: reorganise PerMsgStatus code

2005-07-23 Thread Loren Wilton
I know user rules aren't real popular with the sa dev community, however that attitude isn't universally shared by sa users. Therefore may I suggest: Would it be possible when reorganizing things to come up with some semi-persistant storage for compiled user rules, so that they don't have to be

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-21 Thread Loren Wilton
Duncan earlier enscribed: Masscheck has an interdependency option, although it increases the checking time. We use it on rules once they seem useful, but not usually in early one-off checking. I'm not sure what you mean by this. We have an overlap script which does some of this -- is that

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-21 Thread Loren Wilton
I'm *really worried* about proposals that involve mailing lists that have only private archives and require moderator approval for subscription. It just doesn't feel right for an open source project. I understand the feeling. I'm trying to balance the obvious desire for a completely public

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-21 Thread Loren Wilton
I guess you'd have better data than I would; but I'm still having trouble believing that Spammers are adjusting on that time frame. Some do; not all do. However, the ones that can adjust in less than a day, or maybe less than 2-3 days sometimes, tend to be some of the more prolific spammers.

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-21 Thread Loren Wilton
May I help? (How will you folks decide) Well, to paraphrase how we decide in SARE -- do something, we'll watch. And it really is pretty much that simple. I expect (and this is personal opinion, I'm not an SA dev) that the rules subproject will sooner or later consist of annointed

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-20 Thread Loren Wilton
I'd like to see if there's a way to combine the two somehow so that new SVN commits that update sandbox rules, are immediately mass-checked alone. However, I can't see a way to do that reliably from SVN commits alone, because (for example) meta rules may depend on other rules that were not

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-20 Thread Loren Wilton
What I miss most is a transparent dataset about every rule. I'd like to know - percentage of false positives - percentage of flase negatives - percentage of true positives - percentage of true negatives - number of mails checked for the results above - standard deviation of the percentages

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-20 Thread Loren Wilton
Sidney writes: Perhaps we could use SVN to check in rule submissions so they are version controlled and tracked, and have emails refer to file paths and version numbers instead of attaching the rules. Would that be too complex for the people we want to attract compared to mailing in sets of rules

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-20 Thread Loren Wilton
Could the list be a semi-private one, with moderated subscription and posting? That'd take care of rules in development being exposed to spammers while they're still being worked on, at least partially. The SARE list is private and invitation only for exactly these reasons. You don't want to

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-20 Thread Loren Wilton
Sidney writes: Dealing with metarules and modifications to them presents a problem in any case. How do we deal with person X submitting a modification to metarule A and proposed rule A1, while person Y submits a different modification to metarule A and proposed rule A2 while person Z submits

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-20 Thread Loren Wilton
Dealing with metarules and modifications to them presents a problem in any case. How do we deal with person X submitting a modification to metarule A and proposed rule A1, while person Y submits a different modification to metarule A and proposed rule A2 while person Z submits proposed

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-20 Thread Loren Wilton
--- I guess that part of making the rule submission and test process nimble is for the submitted rules to be independent of anything else. That makes changing metarules less of a nimble process. That's fine, because metarules are really just an optimization which can be implemented after the fact

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-19 Thread Loren Wilton
A big part (perhaps the biggest part) of rules development is the mass check. Most anyone can develop a rule on their home system and see how they *think* it works. Some few (but not many) people can do a mass-check on their home system and see how it *really* works - *for them*. As proposed,

Re: PROPOSAL: create SpamAssassin Rules Project

2005-07-19 Thread Loren Wilton
As rules are put into the sandboxes, they become part of svn. When the nightly mass-checks are run, each person pulls the latest rules sandboxes from svn and does their mass-check with all of those, then rsyncs the results back up to the central site once the mass-check completes. I think I

  1   2   >