Re: ExtractText and docx

Henrik K Thu, 06 May 2021 21:59:14 -0700

On Thu, May 06, 2021 at 09:20:28PM -0400, Alex wrote:
> 
> Also, has anyone written any meta rules for use with ExtractText that
> they'd like to share? I'd like to block all PDF file that contain any
> type of javascript - malicious or otherwise. I'd also like to block
> all PDFs that's a single page and contain a single URL - that appears
> to be the vast majority of all malicious PDFs.


That's something for PDFInfo or the likes.

ExtractText simply extracts text and pretends it's _part_ of the message
body (for body rules etc).  How would that retain any info of what is "a
single PDF page"?  You don't even know from what the text was extracted
from.  Which is why I'm debating if the whole plugin is useful at all or
just feeding Bayes crap.

Re: ExtractText and docx

Reply via email to