On Mon, 26 Nov 2012, John Hardin wrote:

On Mon, 26 Nov 2012, Ed Flecko wrote:

 Hi folks,
 I'm running SpamAssassin version 3.3.2 (running on Perl version
 5.14.2) on FreeBSD 9.0.

 I've exported a bunch of spam and ham messages from my Baracuda 400.

What format did the Barracuda export the messages in? It might be possible to directly feed that to sa-learn if it exported them in one of the "standard" mailbox formats.

 I have an Excel .csv file of about 2500 spam messages and 2500 ham
 messages, and I'm wondering if I can supply those as a parameter to
 sa-learn? I've looked at the documentation
 (http://spamassassin.apache.org/full/3.2.x/doc/sa-learn.html) and I
 see that you can pass the file as a parameter, but I'm not clear how
 you'd do that and in what format the file needs to be? CAN it be a
 .csv or should it be something else?

sa-learn expects either Berkeley-style mailbox files (i.e. RFC-822-format messages separated by "From {stuff about sender}", or mbox one-message-per-file format.

Oops. "maildir one-message-per-file format." Sorry.

If your mailboxes aren't hosted on Windows, then take a look at your inbox file in a text editor to get an idea of the file format. (try "vi $MAIL" if you use vi)

 I'm new to spamassassin, but (for those of you more familiar with the
 product), "teaching" spamassassin is TYPICALLY the first thing one
 would do before deploying it in a production environment, wouldn't
 you?

Not necessarily the first thing, but certainly done early on. SA does fairly well without Bayes, especially if you have DNSBLs and URIBLs enabled, so you don't necessarily need to get it trained before turning it on in production. You can cut down on spam while getting it trained up.

You should turn off autolearn until you've trained it and are sure bayes is giving good results.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 [email protected]    FALaholic #11174     pgpk -a [email protected]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
  does quite what I want. I wish Christopher Robin was here."
                                          -- Peter da Silva in a.s.r
-----------------------------------------------------------------------
 29 days until Christmas


--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 [email protected]    FALaholic #11174     pgpk -a [email protected]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
  does quite what I want. I wish Christopher Robin was here."
                                           -- Peter da Silva in a.s.r
-----------------------------------------------------------------------
 29 days until Christmas

Reply via email to