First of all, please don't post in HTML.

Bill Polhemus was manually quoted as having said:
> I am running SA 2.60 installed from the RPMs on Red Hat 9, on an AMD
> 2100+ based system with a half-gig of RAM.

Personally, I've avoided RH > 7.3 for server work- RH8 and RH9 have seen
any number of really *strange* problems (mostly related to Perl).  There
are also a few other things I saw change in test installs of RH8 and RH9
that I really didn't want to have to deal with on the server.  YMMV.

> I have noticed that when I try to run sa-learn on a corpus of email
> that is “too large,” it will terminate with the message “segmentation
> fault.” Now, someone here says that’s not an SA problem, but a Perl
> problem. No matter. It is irregular and ought not to happen running a
> “standard” version of Perl (I’m using 5.8.0).

Perl has some oddball limitations that don't show up in most regular
use...  but when you start working with large datasets things can go a
little strange.  About the closest I've come to this sort of problem is
the fact that the filter server I'm administering is badly CPU-bound for
virus and spam scans, and so I have to be careful that the Perl-based
filter software I'm running (of several flavours) doesn't run up too
many concurrent copies.

The MIMEDefang list has had a few messages regarding a supporting tool
that does log analysis, and which seems to have troubles with very large
datasets (100K+messages/day and up).  But it doesn't take the whole
system down...  just hoses or loses the db that it's interacting with
and sucks down all available memory.

> This has now happened for the second time. Before when it happened,
> about two weeks ago, I figured it was just a coincidence. Now, I’m
> positive that it’s SA-LEARN that is the culprit, either directly or
> indirectly.

Quite possible;  tokenizing and processing more than a few messages at a
time is likely to occupy quite a bit of memory.  I haven't seen problems
myself (learning ~600+ hams or spams on occasion)- but I have recently
tried to make sure it doesn't auto-rebuild the .db files while
learning;  instead I run a cron job to do so (currently daily).

> SA-Learn works just fine on a “small” corpus. However, there seems to
> be a “dead zone” in there—maybe 200 emails or so—where SA-Learn
> “hangs,” not only itself but the whole system. I’m talking TOTAL
> freeze-up, have to hard-reset, everything.

Hmmm... That's not good.  Have you had more that one shell session open
while running sa-learn to see what the system load, memory usage, etc
are doing while it's running? 

Is this a point failure (200 emails +/- 5-10; no failures above or below
this point) or just "too much data" (failure on any 200+ message mbox)?

I've had no problems (other than general system load and wall clock time
to complete the task) running sa-learn on 600-800+ message mbox files; 
either spam or ham.

> Even worse, it makes hash out of the filesystems, and it takes several
> hard resets before I get rid of the “kernel panic” messages!
> It hoses stuff up like I’ve NEVER seen before with Linux!

This is starting to sound more and more like a "small" hardware issue
that only shows up under load.  The *only* times I've consistently had
trouble with certain operations under Linux (aside from those times when
I've configured something incorrectly, and the software was doing
exactly as I had configured it to do- but not what I wanted it to do),
the hardware has been flaky to one degree or another.

> Something ain’t right here, I’m telling you. I don’t care if it is SA,
> Perl, whatever, there is no way that this should happen.

If you've got a test box you can sacrifice like this repeatedly,
sa-learn -D might provide a little more information that would actually
help solve the problem- or at least point a little closer to a solution.
If you just have the one production box (or that's the only place this
seems to happen), I'd suggest starting by putting together a tool to
split mbox files into 100-message chunks and running sa-learn on those
instead of continuing to attempt to learn larger mbox files.

"Doctor, it hurts when I do this."
"So don't do that."

-kgd
-- 
<erno> hm. I've lost a machine.. literally _lost_. it responds to
ping, it works completely, I just can't figure out where in my
apartment it is.


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?   SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to