-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello,
I was just informed that the latest FuzzyOcr version, 3.2b, includes a function (module from SA) which is only available in 3.1.4, not in 3.1.0. The missing module is Mail::SpamAssassin::Timeout. Currently, the only way to fix this is to upgrade to 3.1.4. I am still unsure wether I should add my own timeout stuff with alert() only to support 3.1.0. Maybe someone else here has a better idea :) Chris decoder wrote: > Hello, > > > I just uploaded FuzzyOcr 2.3b to the download site. If you find > bugs or run into problems, please mail back :) > > The major changes are: > > - Added a configurable timeout (maximum runtime) for the plugin, to > avoid any lockups/unwanted delays - The default matching threshold > (set in the config file) can now be overridden on a per-word basis > in the wordlist > > An example, wordlist contains: > > word1 word2::0 word3::0.2 > > > Then word1 is matched with the default threshold set in the config > file, word2 must be an exact match (threshold 0), and word 3 is > matched with a threshold of 0.2. > > This is especially useful for words which trigger false positives > very often like: "penis", "money" or "news". > > Note that the tendency to produce a FP is not directly connected to > the word length. The word "buy" produces very few FP compared to > "penis", when both are being matched with the same threshold. > > The FuzzyOcr.words.sample contains some suggestions for word > specific thresholds which I recommend. > > - The experimental MD5 database has been replaced by a custom hash > database which is able to match very similar images. > > Often, you get the same image twice, or all your customers get the > same spam mail. But even though the pictures look the same, they > are not identical. That is why MD5 was useless. The newly > introduced hash (self invented) is able to recognize almost > identical images based on features that I won't explain here as it > would make it easier for spammers :) If a message contains a > picture previously registered in the database, the original score > is reread from the database and the message is immediatly tagged > with this score and the plugin ends. > > - Some non-alpha->alpha translations are now used on the gocr > output, that fix common mistakes, like "i" being misread as ";" or > "a" as "8". > > - There are now 2 scores for broken images, one is used when the > picture is recognized as broken, but giffix was able to correct the > errors and it gave some output that can be scanned, the other one > is used if the image is unfixable (that means either too broken, or > interlaced/animated and broken). The first one is set lower than > the second one (2.5 vs. 5). > > -Various bugfixes > > TODO: > > -Write an external program to manage the database (add, remove and > verify given pictures). -Rewrite the temp file system to do all > external program operations on files (saves memory). > > > Another wish: I'd like to create a database to ship with the plugin > so it can be used out of the box but I do not have much samples > here, so it would be nice if you sent me picture samples of common > picture spam you get with "[picture sample]" in the subject to my > mail address. I will post here again if I got enough :). > > > Thanks to Jorge Valdes, Michael Alan Dorman and UxBoD for finding > bugs and sending improvement suggestions for this version > > Chris -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE8JRFJQIKXnJyDxURAgY1AJ97hGp6zw94H+eUCeH2lay9T2mVDgCdFWEE 4VOwP8X4yVlPguHD6S1m9tI= =ufN9 -----END PGP SIGNATURE-----