First of all, please don't post in HTML. Bill Polhemus was manually quoted as having said: > I am running SA 2.60 installed from the RPMs on Red Hat 9, on an AMD > 2100+ based system with a half-gig of RAM.
Personally, I've avoided RH > 7.3 for server work- RH8 and RH9 have seen any number of really *strange* problems (mostly related to Perl). There are also a few other things I saw change in test installs of RH8 and RH9 that I really didn't want to have to deal with on the server. YMMV. > I have noticed that when I try to run sa-learn on a corpus of email > that is “too large,” it will terminate with the message “segmentation > fault.” Now, someone here says that’s not an SA problem, but a Perl > problem. No matter. It is irregular and ought not to happen running a > “standard” version of Perl (I’m using 5.8.0). Perl has some oddball limitations that don't show up in most regular use... but when you start working with large datasets things can go a little strange. About the closest I've come to this sort of problem is the fact that the filter server I'm administering is badly CPU-bound for virus and spam scans, and so I have to be careful that the Perl-based filter software I'm running (of several flavours) doesn't run up too many concurrent copies. The MIMEDefang list has had a few messages regarding a supporting tool that does log analysis, and which seems to have troubles with very large datasets (100K+messages/day and up). But it doesn't take the whole system down... just hoses or loses the db that it's interacting with and sucks down all available memory. > This has now happened for the second time. Before when it happened, > about two weeks ago, I figured it was just a coincidence. Now, I’m > positive that it’s SA-LEARN that is the culprit, either directly or > indirectly. Quite possible; tokenizing and processing more than a few messages at a time is likely to occupy quite a bit of memory. I haven't seen problems myself (learning ~600+ hams or spams on occasion)- but I have recently tried to make sure it doesn't auto-rebuild the .db files while learning; instead I run a cron job to do so (currently daily). > SA-Learn works just fine on a “small” corpus. However, there seems to > be a “dead zone” in there—maybe 200 emails or so—where SA-Learn > “hangs,” not only itself but the whole system. I’m talking TOTAL > freeze-up, have to hard-reset, everything. Hmmm... That's not good. Have you had more that one shell session open while running sa-learn to see what the system load, memory usage, etc are doing while it's running? Is this a point failure (200 emails +/- 5-10; no failures above or below this point) or just "too much data" (failure on any 200+ message mbox)? I've had no problems (other than general system load and wall clock time to complete the task) running sa-learn on 600-800+ message mbox files; either spam or ham. > Even worse, it makes hash out of the filesystems, and it takes several > hard resets before I get rid of the “kernel panic” messages! > It hoses stuff up like I’ve NEVER seen before with Linux! This is starting to sound more and more like a "small" hardware issue that only shows up under load. The *only* times I've consistently had trouble with certain operations under Linux (aside from those times when I've configured something incorrectly, and the software was doing exactly as I had configured it to do- but not what I wanted it to do), the hardware has been flaky to one degree or another. > Something ain’t right here, I’m telling you. I don’t care if it is SA, > Perl, whatever, there is no way that this should happen. If you've got a test box you can sacrifice like this repeatedly, sa-learn -D might provide a little more information that would actually help solve the problem- or at least point a little closer to a solution. If you just have the one production box (or that's the only place this seems to happen), I'd suggest starting by putting together a tool to split mbox files into 100-message chunks and running sa-learn on those instead of continuing to attempt to learn larger mbox files. "Doctor, it hurts when I do this." "So don't do that." -kgd -- <erno> hm. I've lost a machine.. literally _lost_. it responds to ping, it works completely, I just can't figure out where in my apartment it is. ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net help you be more productive? Does it help you create better code? SHARE THE LOVE, and help us help YOU! Click Here: http://sourceforge.net/donate/ _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk