On Thu, 9 Jul 2009, Martin Gregorie wrote:
On Thu, 2009-07-09 at 08:50 -0400, Steve Bertrand wrote:
My question is, given that the messages have already been processed by
the 'cuda's (with their header stamps in place), am I damaging, or at
risk of confusing the learning process of SA when I classify these
messages as SPAM?
Not really answering your question, but I find its helpful to strip SA
headers out of the message collection I use for testing private rules.
Here's a simple bash shell script fragment that does the job and does it
fairly fast:
========================================================================
for f in data/*.txt
do
echo "Cleaning $f"
gawk '
BEGIN { act = "copy" }
/^X-Spam/ { act = "skip" }
/^[A-WYZ]/ { act = "copy" }
{
if (act == "copy")
{ print }
}
' <$f >temp.txt
mv temp.txt $f
done
========================================================================
...wouldn't that mangle wrapped X-Spam headers?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
North Korea: the only country in the world where people would risk
execution to flee to communist China. -- Ride Fast
-----------------------------------------------------------------------
11 days until the 40th anniversary of Apollo 11 landing on the Moon