Re: Improved OCR Plugin with approximate matching

2006-08-18 Thread Matthias Keller
decoder wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 decoder wrote: Hello there, I have improved the original OcrPlugin (found at http://wiki.apache.org/spamassassin/OcrPlugin), so it contains fuzzy matching. Like that, mistakes made by the OCR recognition or intentional obfuscation

Re: Improved OCR Plugin with approximate matching

2006-08-17 Thread decoder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 decoder wrote: > Hello there, > > I have improved the original OcrPlugin (found at > http://wiki.apache.org/spamassassin/OcrPlugin), so it contains > fuzzy matching. Like that, mistakes made by the OCR recognition or > intentional obfuscations in the t

Re: Improved OCR Plugin with approximate matching

2006-08-13 Thread decoder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 decoder wrote: > Hello there, > > I have improved the original OcrPlugin (found at > http://wiki.apache.org/spamassassin/OcrPlugin), so it contains > fuzzy matching. Like that, mistakes made by the OCR recognition or > intentional obfuscations in the t

Re: Improved OCR Plugin with approximate matching

2006-08-10 Thread Theo Van Dinter
On Thu, Aug 10, 2006 at 10:55:30AM -0700, Dave . wrote: > foreach my $p ( $pms->{msg}->find_parts("image") ) { >Does this mean the message must have the text "image" and/or "image/gif" >within the body? Many of the "penny stock" spam gifs I get appear as follows: Generally speaking, RTM (Mail::S

Re: Improved OCR Plugin with approximate matching

2006-08-10 Thread decoder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dave . wrote: > Give them code from Ocr.pm: > > --- foreach my $p ( > $pms->{msg}->find_parts("image") ) { my ( $ctype, $boundary, > $charset, $name ) = Mail::SpamAssassin::Util::parse_content_type( > $p->get_header('content-type') ); i

RE: Improved OCR Plugin with approximate matching

2006-08-10 Thread Dave .
Give them code from Ocr.pm:--- foreach my $p ( $pms->{msg}->find_parts("image") ) { my ( $ctype, $boundary, $charset, $name ) =Mail::SpamAssassin::Util::parse_content_type( $p->get_header('content-type') ); if ( $ctype eq "image/gif" ) { open OCR, "

Re: Improved OCR Plugin with approximate matching

2006-08-10 Thread amosch . security
On Tue, Aug 08, 2006 at 12:43:24AM +0200, decoder wrote: > > You can find a full description and an example in the wiki under: > > http://wiki.apache.org/spamassassin/FuzzyOcrPlugin > > > Ideas for improvements or critics are always welcome :) > > Hi, First, thanks for working on such a gr

Re: Improved OCR Plugin with approximate matching

2006-08-10 Thread decoder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Bill Landry wrote: > - Original Message - From: "Spamassassin List" > <[EMAIL PROTECTED]> > To: > Sent: Wednesday, August 09, 2006 2:26 PM > Subject: Re: Improved OCR Plugin with approximate matchin

Re: Improved OCR Plugin with approximate matching

2006-08-10 Thread Mathias Tauber
> > yum install libungif* will get both libungif and libungif-progs (which > > contains giffix) I'm using Debian (Sarge) and I think libungif-bin is here the better package. giflib-bin wants to install the packages libx11-6, xfree86-common, xlibs-data additionaly. Which means 10MB more than inst

RE: Improved OCR Plugin with approximate matching

2006-08-09 Thread Rick Cooper
> -Original Message- > From: decoder [mailto:[EMAIL PROTECTED] > Sent: Wednesday, August 09, 2006 5:31 PM > To: Spamassassin List; users@spamassassin.apache.org > Subject: Re: Improved OCR Plugin with approximate matching > > [snip] > > According to google, lib

Re: Improved OCR Plugin with approximate matching

2006-08-09 Thread Bill Landry
- Original Message - From: "Spamassassin List" <[EMAIL PROTECTED]> To: Sent: Wednesday, August 09, 2006 2:26 PM Subject: Re: Improved OCR Plugin with approximate matching Spamassassin List wrote: decoder wrote: See http://wiki.apache.org/spamassassin/FuzzyOcrPlugi

Re: Improved OCR Plugin with approximate matching

2006-08-09 Thread decoder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Spamassassin List wrote: >> Spamassassin List wrote: > decoder wrote: > > See http://wiki.apache.org/spamassassin/FuzzyOcrPlugin > > Major changes: Replaced imagemagick with netpbm, support > png, invoked giffix for broken gifs,

Re: Improved OCR Plugin with approximate matching

2006-08-09 Thread Spamassassin List
Spamassassin List wrote: decoder wrote: See http://wiki.apache.org/spamassassin/FuzzyOcrPlugin Major changes: Replaced imagemagick with netpbm, support png, invoked giffix for broken gifs, detect image format with magic bytes and not by content-type, added various configuration options. I ins

Re: Improved OCR Plugin with approximate matching

2006-08-09 Thread decoder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Spamassassin List wrote: >>> decoder wrote: >>> >>> See http://wiki.apache.org/spamassassin/FuzzyOcrPlugin >>> >>> Major changes: Replaced imagemagick with netpbm, support png, >>> invoked giffix for broken gifs, detect image format with magic >>> byte

Re: Improved OCR Plugin with approximate matching

2006-08-09 Thread Spamassassin List
decoder wrote: See http://wiki.apache.org/spamassassin/FuzzyOcrPlugin Major changes: Replaced imagemagick with netpbm, support png, invoked giffix for broken gifs, detect image format with magic bytes and not by content-type, added various configuration options. I install the above plugin, and

Re: Improved OCR Plugin with approximate matching

2006-08-09 Thread decoder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Expertsites, Inc. wrote: >> decoder wrote: >> >> See http://wiki.apache.org/spamassassin/FuzzyOcrPlugin >> >> Major changes: Replaced imagemagick with netpbm, support png, invoked >> giffix for broken gifs, detect image format with magic bytes and not

Re: Improved OCR Plugin with approximate matching

2006-08-09 Thread Expertsites, Inc.
decoder wrote: See http://wiki.apache.org/spamassassin/FuzzyOcrPlugin Major changes: Replaced imagemagick with netpbm, support png, invoked giffix for broken gifs, detect image format with magic bytes and not by content-type, added various configuration options. Feedback is welcome :) Chris

Re: Improved OCR Plugin with approximate matching

2006-08-08 Thread decoder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 decoder wrote: > Hello there, > > I have improved the original OcrPlugin (found at > http://wiki.apache.org/spamassassin/OcrPlugin), so it contains > fuzzy matching. Like that, mistakes made by the OCR recognition or > intentional obfuscations in the t

Re: Improved OCR Plugin with approximate matching

2006-08-08 Thread John D. Hardin
On Tue, 8 Aug 2006, decoder wrote: > I only wanted to add a small note: I recently saw gifs that cannot be > converted using imagemagick because they are either sloppy generated > or with intention partly corrupted. Please think about using giftopnm > and jpegtopnm instead. If you have a better id

Re: Improved OCR Plugin with approximate matching

2006-08-08 Thread Marc Perkel
Perhaps corrupted gifs should be treated as spam? decoder wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello again, I only wanted to add a small note: I recently saw gifs that cannot be converted using imagemagick because they are either sloppy generated or with intention partly

Re: Improved OCR Plugin with approximate matching

2006-08-08 Thread decoder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Matthias Keller wrote: > decoder wrote: >> -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 >> >> Hello there, >> >> I have improved the original OcrPlugin (found at >> http://wiki.apache.org/spamassassin/OcrPlugin), so it contains >> fuzzy matching. Like

Re: Improved OCR Plugin with approximate matching

2006-08-08 Thread Matthias Keller
decoder wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello there, I have improved the original OcrPlugin (found at http://wiki.apache.org/spamassassin/OcrPlugin), so it contains fuzzy matching. Like that, mistakes made by the OCR recognition or intentional obfuscations in the text don't

Re: Improved OCR Plugin with approximate matching

2006-08-08 Thread decoder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello again, I only wanted to add a small note: I recently saw gifs that cannot be converted using imagemagick because they are either sloppy generated or with intention partly corrupted. Please think about using giftopnm and jpegtopnm instead. If yo

Re: Improved OCR Plugin with approximate matching

2006-08-07 Thread jdow
From: "uNiXpSyChO" <[EMAIL PROTECTED]> decoder wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello there, I have improved the original OcrPlugin (found at http://wiki.apache.org/spamassassin/OcrPlugin), so it contains fuzzy matching. Like that, mistakes made by the OCR recognition or i

Re: Improved OCR Plugin with approximate matching

2006-08-07 Thread uNiXpSyChO
seems to work... but i never see a score about 1.00. the docs say the default score is 4. did i miss something? above 1.00 i meant.

Re: Improved OCR Plugin with approximate matching

2006-08-07 Thread uNiXpSyChO
decoder wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello there, I have improved the original OcrPlugin (found at http://wiki.apache.org/spamassassin/OcrPlugin), so it contains fuzzy matching. Like that, mistakes made by the OCR recognition or intentional obfuscations in the text don't