Re: PDFText2 Plugin for PDF file scoring

2007-08-13 Thread James MacLean
James MacLean wrote, on 15/07/07 05:05 PM: Subject: Re: PDFText Plugin for PDF file scoring - PDFText2.pm for ver 3.2 From: James MacLean <[EMAIL PROTECTED]> Date: Sun, 15 Jul 2007 17:05:38 -0300 To: users@spamassassin.apache.org To: users@spamassassin.apache.org Theo Van Dinter wrote,

Re: Bye for good FuzzyOCR - SVN Performance

2007-07-28 Thread James MacLean
Steve West wrote, on 26/07/07 10:59 AM: decoder wrote: Try using the SVN Version (revision 132). This is basically the same as the latest 3.5.x release but some issues with SA 3.2.x were fixed. Best regards, Chris We are running SA 3.2.1 and just wondering if anyone using the SVN version

Re: PDFText - pdftotext from Xpdf 3.02 limitation

2007-07-17 Thread James MacLean
Hi JT, There is the expectation that if the author requested that a PDF not be copied, then the PDF is not to be copied. This is done by a password protecting mechanism when the PDF is saved and exists in the PDF file. The author of Xpdf makes his position known on subverting this feature: h

PDFText - pdftotext from Xpdf 3.02 limitation

2007-07-17 Thread James MacLean
Hi Folks, Noticed that my bodies were not being parsed any more. Found out that SPAM was creating PDF's that are copy protected. Xpdf utils from 3.0 will present the text, but at least 3.02 reports the file is copy protected and does not parse it... Simple fix here was to compile a _special_

Re: pdf tools clarification? - PDFText

2007-07-16 Thread James MacLean
JT DeLys wrote, on 16/07/07 07:02 PM: Seems to me that, assuming I can get the prereqs for FuzzOCR+pdf built correctly (working), that FuzzyOCR /for/ OCR plus PDFText2 for text might be a solid solution ... Wish I had your confidence :). PDFText2 is still too younge to know if it holds up u

Re: PDFText Plugin for PDF file scoring - PDFText2.pm for ver 3.2

2007-07-16 Thread James MacLean
Michael Parker wrote, on 16/07/07 01:58 PM: Theo Van Dinter wrote: IMO, if people find this a useful enough feature of 3.2, it's a relatively trivial change in the code as I recall, so a bugzilla request to backport may get somewhere for a future 3.1 release. I would +1 a backport. M

Re: pdf tools clarification? - PDFText

2007-07-16 Thread James MacLean
JT DeLys wrote, on 16/07/07 06:36 PM: Hi, With PDFText2, the found text is added (rendered) to the main tests that SpamAssassin does. Do you mean to those tests defined in 80_additional.cf? or others? It means any test you do on the body of e-mail will test against this. for example,

Re: pdf tools clarification? - PDFText

2007-07-16 Thread James MacLean
JT DeLys wrote, on 16/07/07 02:14 PM: Hi, Could someone perhaps succinctly summarize the various & sundry anti-pdf-image-spam tools that are currently in play? PDFText -- works in 3.2, not 3.1 This one is my fault :(. PDFText _does_ work in 3.1 and that is where we are getting the most

Re: PDFText Plugin for PDF file scoring - PDFText2.pm for ver 3.2

2007-07-15 Thread James MacLean
Theo Van Dinter wrote, on 14/07/07 02:13 PM: On Sat, Jul 14, 2007 at 09:54:36AM -0300, James MacLean wrote: Where do I find information on hooking into post_message_parse()? Tried greping in the module area with no luck :(. Certainly agree it would be better to get the text out and let

Re: PDFText Plugin for PDF file scoring - not for PDF images

2007-07-14 Thread James MacLean
Dallas Engelken wrote, on 14/07/07 12:17 AM: James MacLean wrote: Hi folks, Regrets if this is the wrong list. Wanted to be able to score on text found in PDF files. Did not see any obvious route, so made a plugin that calls XPDF's pdfinfo and pdftotext to get the text that is then s

PDFText Plugin for PDF file scoring - not for PDF images

2007-07-13 Thread James MacLean
Hi folks, Regrets if this is the wrong list. Wanted to be able to score on text found in PDF files. Did not see any obvious route, so made a plugin that calls XPDF's pdfinfo and pdftotext to get the text that is then scored. Sample local.cf could be : pdftotext_cmd /usr/local/bin/pdftotext