https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #39 from Mark Martinec 2010-01-18 10:06:49
UTC ---
(In reply to comment #32)
> I would imagine that treating the multi-byte characters as individual bytes
> might bite us in ways similar to Bug 6183.
(In reply to comment #3
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #38 from Justin Mason 2010-01-18 09:32:06 UTC ---
regarding Bayes tokenization: use of byte-level breaks there is an explicit
choice, not a bug.
--
Configure bugmail:
https://issues.apache.org/SpamAssassin/userprefs.cgi?ta
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #37 from Warren Togami 2010-01-18 09:16:42 UTC
---
(In reply to comment #35)
> Actually, if you look at Bayes.pm you'll notice that it already has use bytes
> so I'm not sure it will have any effect on bayes tokenization.
O
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #36 from Henrik Krohns 2010-01-18 08:56:35 UTC ---
(In reply to comment #35)
> Actually, if you look at Bayes.pm you'll notice that it already has use bytes
> so I'm not sure it will have any effect on bayes tokenization.
If
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #33 from Warren Togami 2010-01-18 08:43:20 UTC
---
(In reply to comment #31)
> Does everyone agree that this is only isolated to small subsets of messages,
> rather than affecting scan speed for all, or a majority of, messag
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #35 from Michael Parker 2010-01-18 08:46:16 UTC
---
Actually, if you look at Bayes.pm you'll notice that it already has use bytes
so I'm not sure it will have any effect on bayes tokenization.
--
Configure bugmail:
https:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #34 from Henrik Krohns 2010-01-18 08:44:31 UTC ---
On behalf of my corpus, +1 for releasing 3.3.0 also.
--
Configure bugmail:
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
--- You are receiving this ma
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #32 from Warren Togami 2010-01-18 08:26:41 UTC
---
(In reply to comment #28)
> Not surprisingly it affects Bayes, but only as slightly as the rules. Probably
> tokens containing highbits etc. It's simple to test with sa-lear
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #31 from Justin Mason 2010-01-18 08:23:27 UTC ---
Does everyone agree that this is only isolated to small subsets of messages,
rather than affecting scan speed for all, or a majority of, messages?
("mass-check" is the approp
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #30 from Justin Mason 2010-01-18 05:57:29 UTC ---
(In reply to comment #23)
> so that's a barely-noticeable difference of 4 mails (out of 5000 hams) more
> __HIGHBITS hits, and 2 less __TVD_SPACE_RATIO hits in spam. no scori
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #29 from Mark Martinec 2010-01-18 04:06:45
UTC ---
(In reply to comment #24)
> What is the exact patch to test "use bytes"? My corpus has lots of Japanese
> mail and I could run masscheck.
The exact location is not critica
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #28 from Henrik Krohns 2010-01-17 23:29:23 UTC ---
Not surprisingly it affects Bayes, but only as slightly as the rules. Probably
tokens containing highbits etc. It's simple to test with sa-learn and comparing
dumps.
--
Co
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #27 from Warren Togami 2010-01-17 23:18:18 UTC
---
Even if the rule hits are roughly equivalent:
* Is this functionally equivalent for the tokens going in/out of Bayes?
* What about those reasons why this was removed years
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #26 from Henrik Krohns 2010-01-17 21:43:03 UTC ---
Looking at mass runtimes it's probably worth exploring only after 3.3.0. I'll
be enabling use bytes on my slow server though, since I get many bad messages
and the effect is
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #25 from Michael Parker 2010-01-17 20:31:03 UTC
---
Note that I've seen differences between 18 and 28 percent on a mass-check of
around 50k spam and ham.
I was seeing similar differences in my freqdiff, HIGHBITS was the hig
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #24 from Warren Togami 2010-01-17 16:58:35 UTC
---
What is the exact patch to test "use bytes"? My corpus has lots of Japanese
mail and I could run masscheck.
But given 48 minutes vs 50 minutes is this really worthwhile of
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #23 from Justin Mason 2010-01-17 16:24:37 UTC ---
(In reply to comment #22)
> (In reply to comment #21)
> > btw, I think this significant slowdown in 3.3.0 may be as a result of
> > increased use of replace_rules rules, comp
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
John Hardin changed:
What|Removed |Added
CC||jhar...@impsec.org
--- Comment #2
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #21 from Justin Mason 2010-01-16 15:58:59 UTC ---
btw, I think this significant slowdown in 3.3.0 may be as a result of
increased use of replace_rules rules, compared to when bug 4596 happened.
--
Configure bugmail:
https
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #20 from Henrik Krohns 2010-01-16 06:44:50 UTC ---
Surprisinly total scan time was only 10min vs 13min in favor of use bytes.
--
Configure bugmail:
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
--- You
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #19 from Henrik Krohns 2010-01-16 06:38:51 UTC ---
Here's a quick run of 10k ham + 10k spam on perl 5.10.0.
$ ./freqdiff -c ham.log.nobytes ham.log.bytes
518 __HIGHBITS
64 T_HK_MUCHMONEY
52 __hk_million
39
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #18 from Michael Parker 2010-01-15 23:33:37 UTC
---
Some history:
http://svn.apache.org/viewvc?view=revision&revision=315047
And Bug 4596
--
Configure bugmail:
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=em
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #17 from Warren Togami 2010-01-15 22:24:09 UTC
---
There is no doubt that this makes it faster, but I don't see any discussion
here verifying that it results in correct behavior. I personally am OK with
further delay of 3.3
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
Daryl C. W. O'Shea changed:
What|Removed |Added
CC||spamassas...@dostech.ca
--
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
Henrik Krohns changed:
What|Removed |Added
CC||h...@hege.li
--- Comment #15 fr
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
Mark Martinec changed:
What|Removed |Added
Platform|Sun |All
OS/Version|Solaris
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #14 from Mark Martinec 2010-01-15 16:25:14
UTC ---
Created an attachment (id=4644)
--> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4644)
my sample message, gzipped
Here attached is my newsletter sample as use
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #13 from Warren Togami 2010-01-15 13:29:03 UTC
---
The attached Test case takes ~2 seconds with Fedora 12 perl-5.10.0 here, and
~12 seconds with RHEL-5 perl-5.8.8.
Could someone please attach an example mail that is an even
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #12 from Mark Martinec 2010-01-14 17:56:39
UTC ---
I repeated the same test with 'sa-compile'-d rules, this time with
disabled HitFreqsRuleTiming, disabled debugging and disabled bayes.
The shown tests_pri_0 time corresponds
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #11 from Mark Martinec 2010-01-14 17:16:50
UTC ---
Took a 176 KB message that showed slow processing in the log, and run it
with and without 'use bytes' in Message.pm. Network tests disabled,
some SARE rules, HitFreqsRuleTim
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Comment #10 from Michael Parker 2010-01-14 07:32:47 UTC
---
Can someone do a freqdiff between two runs, especially with mail that has
highbit characters.
--
Configure bugmail:
https://issues.apache.org/SpamAssassin/userprefs.cgi?
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
Justin Mason changed:
What|Removed |Added
Priority|P5 |P2
Severity|normal
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
Justin Mason changed:
What|Removed |Added
CC||j...@jmason.org
--- Comment #8 f
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
Matt Selsky changed:
What|Removed |Added
CC||sel...@columbia.edu
--
Configure
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
Mark Martinec changed:
What|Removed |Added
Target Milestone|Undefined |3.3.0
--
Configure bugmail:
h
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
Larry Rosenbaum changed:
What|Removed |Added
CC||rosenbau...@ornl.gov
--- Comm
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Additional Comments From [EMAIL PROTECTED] 2008-01-15 12:22 ---
This problem still exists in v3.2.4.
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Additional Comments From [EMAIL PROTECTED] 2007-08-11 20:18 ---
fwiw, I would just change your LANG environment variable.
--- You are receiving this mail because: ---
You are the assignee for the bug, or are wat
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Additional Comments From [EMAIL PROTECTED] 2007-08-11 19:41 ---
Side note: Mark Martinec suggested this might be due to UTF-8 encoding in the
locale.
While my test system did have en_US.UTF-8 as the LANG, resetting
/etc/
Matt Kettler wrote:
> Mark Martinec wrote:
>
>>> However, in mine the difference when using a "stock" 3.2.3 is barely
>>> noticeable, going from 9 seconds to 8 seconds.
>>>
>>> Adding in a good handful of SARE rules (1365 extra rules, counting "score"
>>> lines) makes the difference quite signif
Good catch, mine is UTF-8.. Not sure about the original reporter.
Probably not, or they wouldn't be seeing the large difference they see.
Loren
Mark Martinec wrote:
>> However, in mine the difference when using a "stock" 3.2.3 is barely
>> noticeable, going from 9 seconds to 8 seconds.
>>
>> Adding in a good handful of SARE rules (1365 extra rules, counting "score"
>> lines) makes the difference quite significant.
>>
>> Without "use bytes"
> However, in mine the difference when using a "stock" 3.2.3 is barely
> noticeable, going from 9 seconds to 8 seconds.
>
> Adding in a good handful of SARE rules (1365 extra rules, counting "score"
> lines) makes the difference quite significant.
>
> Without "use bytes" and the SARE rules:
> real
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Additional Comments From [EMAIL PROTECTED] 2007-08-10 19:06 ---
Confirmed my test box can replicate the results using a crude: time spamassassin
-t
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Additional Comments From [EMAIL PROTECTED] 2007-08-10 11:04 ---
Created an attachment (id=4083)
--> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=4083&action=view)
spamd debug log from test case
--- You
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590
--- Additional Comments From [EMAIL PROTECTED] 2007-08-10 11:02 ---
Created an attachment (id=4082)
--> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=4082&action=view)
Test case
--- You are receiving this m
46 matches
Mail list logo