Re: collecting corpora

2009-08-13 Thread Warren Togami
On 08/13/2009 07:33 PM, Michael Parker wrote: Historical accuracy of network tests is key, providing corpora without SpamAssassin rules from actual receive time does not help scoring, it hurts it. Michael Then shouldn't the documentation mention this? My corpora is inconsistently filtered

Re: collecting corpora

2009-08-13 Thread Michael Parker
On Aug 13, 2009, at 4:07 PM, Warren Togami wrote: On 08/13/2009 11:04 AM, Justin Mason wrote: IMHO, none of the network tests should be used during masscheck for ham older than 4 weeks. Thoughts? if we had enough ham to get useful results with that limit, sure. As it is, I'm not sure

Re: collecting corpora

2009-08-13 Thread Warren Togami
On 08/13/2009 11:04 AM, Justin Mason wrote: IMHO, none of the network tests should be used during masscheck for ham older than 4 weeks. Thoughts? if we had enough ham to get useful results with that limit, sure. As it is, I'm not sure that's the case. If we are in agreement that old netw

[Bug 5878] IPV4_ADDRESS regexp matches ip.ad.dr.in-addr.arpa format

2009-08-13 Thread bugzilla-daemon
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5878 Mark Martinec changed: What|Removed |Added Status|NEW |RESOLVED Resolution|

[Bug 5878] IPV4_ADDRESS regexp matches ip.ad.dr.in-addr.arpa format

2009-08-13 Thread bugzilla-daemon
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5878 --- Comment #8 from Justin Mason 2009-08-13 09:45:43 PST --- (In reply to comment #7) > Bug 5878: IPV4_ADDRESS regexp matches ip.ad.dr.in-addr.arpa format > (attempting a fix; do we have any tests for this?) yep, I think we do.

[Bug 5878] IPV4_ADDRESS regexp matches ip.ad.dr.in-addr.arpa format

2009-08-13 Thread bugzilla-daemon
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5878 --- Comment #7 from Mark Martinec 2009-08-13 09:08:22 PST --- Bug 5878: IPV4_ADDRESS regexp matches ip.ad.dr.in-addr.arpa format (attempting a fix; do we have any tests for this?) Sendinglib/Mail/SpamAssassin/Plugin/Rel

[Bug 5878] IPV4_ADDRESS regexp matches ip.ad.dr.in-addr.arpa format

2009-08-13 Thread bugzilla-daemon
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5878 --- Comment #6 from Mark Martinec 2009-08-13 08:54:47 PST --- Created an attachment (id=4516) --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4516) Suggested patch -- Configure bugmail: https://issues.apache.org/S

Custom rules with Custom Corpus and Nightly Mass Check [bugs.debian.org, lists.debian.org]

2009-08-13 Thread Don Armstrong
I'm interested in being able to run Nightly Mass Check on a large set of custom rules[0] (currently around 1300) with a fairly large spam and ham corpus from bugs.debian.org and lists.debian.org. I'm not sure whether this is something that 1) we could have integrated into the normal nightly mass c

Re: Time Based Reuse for Rules

2009-08-13 Thread Warren Togami
On 08/13/2009 10:57 AM, Justin Mason wrote: This is a patch to the Reuse plugin I'd like to see, if someone has any ideas please speak up. it'd work well there. Pretend to be able to reuse previous hits, but the actual effect would be to inhibit the rule entirely on old mails. My corpus is

Re: collecting corpora

2009-08-13 Thread Warren Togami
On 08/13/2009 11:04 AM, Justin Mason wrote: yep, I was talking with a SURBLer about this last week I think. we should probably add meta conditions ot the URIBL ruleset to ensure they don't fire at all on old messages. IMHO, none of the network tests should be used during masscheck for ham ol

Re: collecting corpora

2009-08-13 Thread Justin Mason
On Thu, Aug 13, 2009 at 16:00, Warren Togami wrote: > On 08/13/2009 07:26 AM, Justin Mason wrote: >> >> On Thu, Aug 13, 2009 at 11:46, Jeff Chan  wrote: >>> >>> On Thursday, July 16, 2009, 1:40:34 PM, Justin Mason wrote: One useful factor of ham is that it's not time-sensitive; a mail tha

Re: collecting corpora

2009-08-13 Thread Warren Togami
On 08/13/2009 07:26 AM, Justin Mason wrote: On Thu, Aug 13, 2009 at 11:46, Jeff Chan wrote: On Thursday, July 16, 2009, 1:40:34 PM, Justin Mason wrote: One useful factor of ham is that it's not time-sensitive; a mail that was ham in 2003 would still be ham today. So we can collect old ham mai

Re: Time Based Reuse for Rules

2009-08-13 Thread Justin Mason
On Thu, Aug 13, 2009 at 15:32, Michael Parker wrote: > > On Aug 13, 2009, at 4:26 AM, Justin Mason wrote: > >> On Thu, Aug 13, 2009 at 11:46, Jeff Chan wrote: >>> >>> On Thursday, July 16, 2009, 1:40:34 PM, Justin Mason wrote: One useful factor of ham is that it's not time-sensitive; a ma

[Bug 6180] make test fails for dcc and dkim2 and razor2

2009-08-13 Thread bugzilla-daemon
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6180 --- Comment #9 from Larry Rosenbaum 2009-08-13 07:34:06 PST --- (In reply to comment #8) > > Created an attachment (id=4515) --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4515) [details] [details] > > gzipped outp

Time Based Reuse for Rules

2009-08-13 Thread Michael Parker
On Aug 13, 2009, at 4:26 AM, Justin Mason wrote: On Thu, Aug 13, 2009 at 11:46, Jeff Chan wrote: On Thursday, July 16, 2009, 1:40:34 PM, Justin Mason wrote: One useful factor of ham is that it's not time-sensitive; a mail that was ham in 2003 would still be ham today. So we can collect old

[Bug 6180] make test fails for dcc and dkim2 and razor2

2009-08-13 Thread bugzilla-daemon
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6180 --- Comment #8 from Mark Martinec 2009-08-13 07:08:15 PST --- > Created an attachment (id=4515) --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4515) [details] > gzipped output from prove -v t/dkim2.t with debug 'all

[Bug 6180] make test fails for dcc and dkim2 and razor2

2009-08-13 Thread bugzilla-daemon
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6180 --- Comment #7 from Larry Rosenbaum 2009-08-13 06:56:32 PST --- Created an attachment (id=4515) --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4515) gzipped output from prove -v t/dkim2.t with debug 'all' -- Confi

[Bug 6180] make test fails for dcc and dkim2 and razor2

2009-08-13 Thread bugzilla-daemon
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6180 --- Comment #6 from Larry Rosenbaum 2009-08-13 06:55:05 PST --- Created an attachment (id=4514) --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4514) Output from prove -v t/razor2.t Requested by Justin -- Configur

[Bug 6180] make test fails for dcc and dkim2 and razor2

2009-08-13 Thread bugzilla-daemon
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6180 --- Comment #5 from Larry Rosenbaum 2009-08-13 06:52:32 PST --- Created an attachment (id=4513) --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4513) Retest with new DCC.pm It works! -- Configure bugmail: https:/

[Bug 6180] make test fails for dcc and dkim2 and razor2

2009-08-13 Thread bugzilla-daemon
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6180 --- Comment #4 from Mark Martinec 2009-08-13 05:33:12 PST --- Created an attachment (id=4512) --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4512) a replacement DCC.pm Please try the: prove -v t/dcc.t with this rep

[Bug 6180] make test fails for dcc and dkim2 and razor2

2009-08-13 Thread bugzilla-daemon
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6180 --- Comment #3 from Mark Martinec 2009-08-13 05:23:56 PST --- Bug 5649, Bug 6180 - DCC plugin: - enable usage of a remote dccifd hosts by extending semantics of a config parameter dcc_dccifd_path, which can be either a socket p

[Bug 5649] PATCH: Enable usage of remote dccifd hosts

2009-08-13 Thread bugzilla-daemon
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5649 Mark Martinec changed: What|Removed |Added Status|NEW |RESOLVED Resolution|

[Bug 5380] SUBJECT_FUZZY_MEDS triggers on un-obfuscated meds and meds in a word

2009-08-13 Thread bugzilla-daemon
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5380 --- Comment #4 from Justin Mason 2009-08-13 04:33:27 PST --- btw if we can get some samples of what it's _supposed_ to hit, that would help too. (this is a new approach to rule regression testing I'm working on.) -- Configure bug

[Bug 6072] pod warnings

2009-08-13 Thread bugzilla-daemon
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6072 --- Comment #3 from Justin Mason 2009-08-13 04:32:39 PST --- (In reply to comment #2) > Patch still applies cleanly to r803779 ( > https://svn.apache.org/viewcvs.cgi?view=rev&rev=803779 ). > > Any interest in a Pod::Test test to c

[Bug 5073] Broken link to Mail::SpamAssassin::Conf POD documentation

2009-08-13 Thread bugzilla-daemon
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5073 Justin Mason changed: What|Removed |Added Status|NEW |RESOLVED Resolution|

Re: collecting corpora

2009-08-13 Thread Justin Mason
On Thu, Aug 13, 2009 at 11:46, Jeff Chan wrote: > On Thursday, July 16, 2009, 1:40:34 PM, Justin Mason wrote: >> One useful factor of ham is that it's not time-sensitive; a mail that >> was ham in 2003 would still be ham today.  So we can collect old ham >> mail archives, or submissions of relative

Re: collecting corpora

2009-08-13 Thread Jeff Chan
On Thursday, July 16, 2009, 1:40:34 PM, Justin Mason wrote: > One useful factor of ham is that it's not time-sensitive; a mail that > was ham in 2003 would still be ham today. So we can collect old ham > mail archives, or submissions of relatively old mail, if necessary. This may be a false assum

[Bug 6180] make test fails for dcc and dkim2 and razor2

2009-08-13 Thread bugzilla-daemon
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6180 --- Comment #2 from Justin Mason 2009-08-13 03:33:39 PST --- could you run "prove -v t/razor2.t" ? -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email --- You are receiving this mail because:

[Bug 5380] SUBJECT_FUZZY_MEDS triggers on un-obfuscated meds and meds in a word

2009-08-13 Thread bugzilla-daemon
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5380 --- Comment #3 from Matus UHLAR - fantomas 2009-08-13 00:32:05 PST --- the rule matches also on czech/slovak words "medzi" (inter), "obmedzit" (to limit). Yes I'd be glad if we'd have way to cut FPs down. -- Configure bugmail: h