fanx all, <stupid-faced-smile

i setted up fuzzyocr yesterday and it gives pretty good result
i need some time to well understand all :
- sometimes, using spamc -R or spamassassin -t, i can see fuzzy ocr filter displaying score results
- but looking in spam folder and at the report of the marked as spam mail : i can't see the fuzzy ocr text and it seems that the mail has this processing : the image is converted in text then goes through the normal filtering process as the mail would has been received in the text format

anyhow, i put here som tips on how i did because it is not so obvious :

<<==

note : i use a redhat like : http://whiteboxlinux.org
# cat /etc/whitebox-release
White Box Enterprise Linux release 3.0 (Liberation Respin 2)
# uname -a
Linux empereur.rungis 2.4.21-27.EL #1 Mon Feb 28 19:03:06 EST 2005 i686 i686 i386 GNU/Linux
# spamassassin -V
SpamAssassin version 3.0.4
  running on Perl version 5.8.0
and qmail

## first reference : http://wiki.apache.org/spamassassin/FuzzyOcrPlugin (fanx to decoder)

## prerequisites :
## to check if you have perl module String::Approx installed ?
[EMAIL PROTECTED] root]# perl -e 'use String::Approx'
if you get no error : this is good, else do this :

wget http://search.cpan.org/CPAN/authors/id/J/JH/JHI/String-Approx-3.26.tar.gz
tar xvzf String-Approx-3.26.tar.gz
cd String-Approx-3.26
perl Makefile.PL
make
make test
make install

## netpbm and other
rpm -qa | grep -i netpbm
rpm -qa | grep -iE "giflib|libungif"

## gocr ? on sourceforge
wget http://ovh.dl.sourceforge.net/sourceforge/jocr/gocr-0.40.tar.gz
tar tvzf gocr-0.40.tar.gz
cd gocr-0.40
wget http://users.own-hero.net/~decoder/fuzzyocr/gocr-segfault.patch
patch -p0 < gocr-segfault.patch

./configure --with-netpbm=yes   (IMPORTANT : you need explicit the option)
make
make examples
make install

ln -s /usr/local/bin/gocr /usr/bin/gocr   (I needed this because it was not in path)

## giftext source and patch rpm -qf `type -p giftext`
rpm -qa | grep -i libungif
cp /usr/bin/giftext /usr/bin/giftext.ori (move the original file in *.ori)

wget http://users.own-hero.net/~decoder/fuzzyocr/giftext-segfault.patch
wget http://ovh.dl.sourceforge.net/sourceforge/libungif/libungif-4.1.4.tar.bz2
tar xjvf libungif-4.1.4.tar.bz2
cd libungif-4.1.4
(cd util; patch -p0 < ../../giftext-segfault.patch)
./configure
gmake (or make)
(gmake install : i didn't launch it : just copied giftext binary)
cp util/giftext /usr/bin/giftext
    strange : previous file was an executable, the new is only a shell script... um... but it works

## da FuzzyOcr plugin
wget http://users.own-hero.net/~decoder/fuzzyocr/fuzzyocr-2.1.tar.gz
cd /etc/mail/spamassassin/
tar xvzf fuzzyocr-2.1.tar.gz

BUG :
my jpegtopnm which comes from my redhat distro doesn't handle the - (dash) argument to read on its standard input
so i made a wrapper to cancel its use (...wow...) :
<<
# cat /usr/bin/jpegtopnm
#! /bin/sh
BINAIRE="/usr/bin/jpegtopnm.ori"
if [ "$1" == "-" ]; then
        $BINAIRE
else
        $BINAIRE $@
fi
>>

else, you get this error :
<<
[EMAIL PROTECTED] ajustement_spam]# echo glassware.gif | jpegtopnm
Not a JPEG file: starts with 0x67 0x6c
[EMAIL PROTECTED] ajustement_spam]# echo glassware.gif | jpegtopnm -
jpegtopnm: Can't open -.  Errno=No such file or directory(2).
>>
==>>

fanx great to all



jdow a écrit :
From: "Spamassassin List" <[EMAIL PROTECTED]>

Spamassassin List wrote:
Stephane Bentebba wrote:
hi all,

i am more or less happy with my spamassassin configuration
works good for one year
but i have problem with a new kind of spam which easylly go
throught it :
spam which has poor text, poor token, or none, and a subject
always changing
the only thing which remain the same is the image incoporated in it
it get always very low hit (bellow 3)
subject on the image in the body is either "breaking news
concerning..." or "we have a runner !"
would it be possible to find a solution ?
add / modify a test to look at first bytes of an attachement and
recognize the image ?
i can send you samples of this spam if you like... (prefer not to
attach them)
Have a look at FuzzyOCR
http://wiki.apache.org/spamassassin/FuzzyOcrPlugin

Works very well for me - I'm using it in conjuction with ImageInfo
and since I'm using them those image spams get through VERY rarely
They will also block off legit emails too
How so?

I wouldn't expect any from FuzzyOCR but ImageInfo certainly has the chance to block legit mail.

Sorry, I meant ImageInfo plugin.. I have many legit emails blocked by this plugin.

Reduce its score and perhaps use it in meta rules.
{^_^}


--
mysignature
Stéphane Bentebba
Technicien de Maintenance
Tél.:
 +33 (0)1.41.73.20.16
Fax.:
 +33 (0)1.41.73.20.08
FPS France
Parc d'affaire Silic
43, rue de la Grosse Pierre
BP 40160
94.533 RUNGIS Cedex





Reply via email to