Re: Am I fscking up my bayes db?

John Hardin Thu, 09 Jul 2009 08:09:25 -0700

On Thu, 9 Jul 2009, Martin Gregorie wrote:

On Thu, 2009-07-09 at 08:50 -0400, Steve Bertrand wrote:

My question is, given that the messages have already been processed by
the 'cuda's (with their header stamps in place), am I damaging, or at
risk of confusing the learning process of SA when I classify these
messages as SPAM?


Not really answering your question, but I find its helpful to strip SA
headers out of the message collection I use for testing private rules.
Here's a simple bash shell script fragment that does the job and does it
fairly fast:

========================================================================
for f in data/*.txt
do
       echo "Cleaning $f"
       gawk '
               BEGIN           { act = "copy" }
               /^X-Spam/       { act = "skip" }
               /^[A-WYZ]/      { act = "copy" }
                               {
                                 if (act == "copy")
                                       { print }
                               }
       ' <$f >temp.txt
       mv temp.txt $f
done
========================================================================


...wouldn't that mangle wrapped X-Spam headers?

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  North Korea: the only country in the world where people would risk
  execution to flee to communist China.                  -- Ride Fast
-----------------------------------------------------------------------
 11 days until the 40th anniversary of Apollo 11 landing on the Moon

Re: Am I fscking up my bayes db?

Reply via email to