On Thu, May 06, 2021 at 09:20:28PM -0400, Alex wrote: > > Also, has anyone written any meta rules for use with ExtractText that > they'd like to share? I'd like to block all PDF file that contain any > type of javascript - malicious or otherwise. I'd also like to block > all PDFs that's a single page and contain a single URL - that appears > to be the vast majority of all malicious PDFs.
That's something for PDFInfo or the likes. ExtractText simply extracts text and pretends it's _part_ of the message body (for body rules etc). How would that retain any info of what is "a single PDF page"? You don't even know from what the text was extracted from. Which is why I'm debating if the whole plugin is useful at all or just feeding Bayes crap.
