Re: FuzzyOcr 2.3b release, broken with SA 3.1.0

decoder Sat, 26 Aug 2006 11:35:09 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,


I was just informed that the latest FuzzyOcr version, 3.2b, includes a
function (module from SA) which is only available in 3.1.4, not in
3.1.0. The missing module is Mail::SpamAssassin::Timeout. Currently,
the only way to fix this is to upgrade to 3.1.4. I am still unsure
wether I should add my own timeout stuff with alert() only to support
3.1.0.

Maybe someone else here has a better idea :)



Chris



decoder wrote:
> Hello,
>
>
> I just uploaded FuzzyOcr 2.3b to the download site. If you find
> bugs or run into problems, please mail back :)
>
> The major changes are:
>
> - Added a configurable timeout (maximum runtime) for the plugin, to
>  avoid any lockups/unwanted delays - The default matching threshold
> (set in the config file) can now be overridden on a per-word basis
> in the wordlist
>
> An example, wordlist contains:
>
> word1 word2::0 word3::0.2
>
>
> Then word1 is matched with the default threshold set in the config
> file, word2 must be an exact match (threshold 0), and word 3 is
> matched with a threshold of 0.2.
>
> This is especially useful for words which trigger false positives
> very often like: "penis", "money" or "news".
>
> Note that the tendency to produce a FP is not directly connected to
> the word length. The word "buy" produces very few FP compared to
> "penis", when both are being matched with the same threshold.
>
> The FuzzyOcr.words.sample contains some suggestions for word
> specific thresholds which I recommend.
>
> - The experimental MD5 database has been replaced by a custom hash
> database which is able to match very similar images.
>
> Often, you get the same image twice, or all your customers get the
> same spam mail. But even though the pictures look the same, they
> are not identical. That is why MD5 was useless. The newly
> introduced hash (self invented) is able to recognize almost
> identical images based on features that I won't explain here as it
> would make it easier for spammers :) If a message contains a
> picture previously registered in the database, the original score
> is reread from the database and the message is immediatly tagged
> with this score and the plugin ends.
>
> - Some non-alpha->alpha translations are now used on the gocr
> output, that fix common mistakes, like "i" being misread as ";" or
> "a" as "8".
>
> - There are now 2 scores for broken images, one is used when the
> picture is recognized as broken, but giffix was able to correct the
>  errors and it gave some output that can be scanned, the other one
> is used if the image is unfixable (that means either too broken, or
>  interlaced/animated and broken). The first one is set lower than
> the second one (2.5 vs. 5).
>
> -Various bugfixes
>
> TODO:
>
> -Write an external program to manage the database (add, remove and
> verify given pictures). -Rewrite the temp file system to do all
> external program operations on files (saves memory).
>
>
> Another wish: I'd like to create a database to ship with the plugin
> so it can be used out of the box but I do not have much samples
> here, so it would be nice if you sent me picture samples of common
> picture spam you get with "[picture sample]" in the subject to my
> mail address. I will post here again if I got enough :).
>
>
> Thanks to Jorge Valdes, Michael Alan Dorman and UxBoD for finding
> bugs and sending improvement suggestions for this version
>
> Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE8JRFJQIKXnJyDxURAgY1AJ97hGp6zw94H+eUCeH2lay9T2mVDgCdFWEE
4VOwP8X4yVlPguHD6S1m9tI=
=ufN9
-----END PGP SIGNATURE-----

Re: FuzzyOcr 2.3b release, broken with SA 3.1.0

Reply via email to