On Thu, Sep 1, 2016 at 12:27 AM, Olivier <olivier.nic...@cs.ait.ac.th> wrote:
> Richard,
>
>> I am looking at Fuzzy ocr to detect more image spam and I had a couple
>> of questions;
>
> FuzzyOCR does not detect image spam per se, it detects spam text in an
> image. To classify image spam, you could consider image Cerberus that
> does a classification on images metadata (size, presence of text, etc.)
>
>> 1)      Is this being used? Does it detect image spam, or should I be
>> looking at something else?
>
> Yes. No, maybe.
>
> I am running it, it does not do a very good job at extracting the text
> from the images. Then it uses it's own list of keywords to detect spam:
> to me it's the biggest problem, it should push back the text to
> SpamAssassin and let SA rules decide what to do with it.
>
      I do agree that the OCR program should be doing the OCR'ing and
the text filtering should be left to a program that does that for a
living. In the modern, systemd world this is of course an ancient and
outdated design philosophy.

>> 2)      I'm getting some horny date spam coming through with just
>> images and text inside an image at the bottom. My bayes seems to be
>> scoring this with -1.90 Bayes_00. I keep sending this to my database
>> as spam but I'm not sure how many I need to feed it and I don't get
>> much. Are there any other means of feeding bayes with image spam (or
>> any spam really) from a source on the internet? Or is that a bad idea
>> since that's not my spam?
>
> The ideal plugin would be able to look at a picture and decide that it's
> an horny date :) I remember we once had a student that wanted to work on
> classifying picture by the amount of flesh to decide whether it was a
> naked picture or not/ But I don't think he ever succeeded.
>
      I need to find where I saw this - might even have been in
wikipedia of all places -- but China or some other country has a
program that blocks images on the internet based on the amount of
flesh. As a result, it would block a picture of a bunch of pigs
feeding. Maybe it is the same guy?

>> 3)      If I use Fuzzy OCR on FreeBSD, how does it get updated?
>
> I doubt FuzzyOCR ever gets updated, on FreeBSD or elsewhere.
>
>> 4)      I installed it from the ports and I had to install tesseract
>> or I got a dependency warning message. Now I still get a warning -
>> warn: FuzzyOcr: Cannot find executable for gifinter - Is this normal?
>> How should I omit this error since I can't find gifinter in the ports
>> tree?
>
> gifinter used to be part of /usr/ports/graphics/giflib
> but the NEWS file mentions that:
> Version 5.0.1
> =============
> Retirements
> -----------
> * gifinter is gone.  Use convert -interlace from the ImageMagick suite.
>
> In my case, I still have an old executable of gifinter laying around,
> but I think you would configure FuzzyOCF.cf with an approprate line of
> the form:
>
> focr_bin_gifinter /usr/local/bin/convert -interlace and the needed
> parameters.
>
> Best regards,
>
> Olivier

Reply via email to