On Thu, Sep 1, 2016 at 12:27 AM, Olivier <olivier.nic...@cs.ait.ac.th> wrote: > Richard, > >> I am looking at Fuzzy ocr to detect more image spam and I had a couple >> of questions; > > FuzzyOCR does not detect image spam per se, it detects spam text in an > image. To classify image spam, you could consider image Cerberus that > does a classification on images metadata (size, presence of text, etc.) > >> 1) Is this being used? Does it detect image spam, or should I be >> looking at something else? > > Yes. No, maybe. > > I am running it, it does not do a very good job at extracting the text > from the images. Then it uses it's own list of keywords to detect spam: > to me it's the biggest problem, it should push back the text to > SpamAssassin and let SA rules decide what to do with it. > I do agree that the OCR program should be doing the OCR'ing and the text filtering should be left to a program that does that for a living. In the modern, systemd world this is of course an ancient and outdated design philosophy.
>> 2) I'm getting some horny date spam coming through with just >> images and text inside an image at the bottom. My bayes seems to be >> scoring this with -1.90 Bayes_00. I keep sending this to my database >> as spam but I'm not sure how many I need to feed it and I don't get >> much. Are there any other means of feeding bayes with image spam (or >> any spam really) from a source on the internet? Or is that a bad idea >> since that's not my spam? > > The ideal plugin would be able to look at a picture and decide that it's > an horny date :) I remember we once had a student that wanted to work on > classifying picture by the amount of flesh to decide whether it was a > naked picture or not/ But I don't think he ever succeeded. > I need to find where I saw this - might even have been in wikipedia of all places -- but China or some other country has a program that blocks images on the internet based on the amount of flesh. As a result, it would block a picture of a bunch of pigs feeding. Maybe it is the same guy? >> 3) If I use Fuzzy OCR on FreeBSD, how does it get updated? > > I doubt FuzzyOCR ever gets updated, on FreeBSD or elsewhere. > >> 4) I installed it from the ports and I had to install tesseract >> or I got a dependency warning message. Now I still get a warning - >> warn: FuzzyOcr: Cannot find executable for gifinter - Is this normal? >> How should I omit this error since I can't find gifinter in the ports >> tree? > > gifinter used to be part of /usr/ports/graphics/giflib > but the NEWS file mentions that: > Version 5.0.1 > ============= > Retirements > ----------- > * gifinter is gone. Use convert -interlace from the ImageMagick suite. > > In my case, I still have an old executable of gifinter laying around, > but I think you would configure FuzzyOCF.cf with an approprate line of > the form: > > focr_bin_gifinter /usr/local/bin/convert -interlace and the needed > parameters. > > Best regards, > > Olivier