ExtractText and zbarimg

Alex Wed, 02 Jul 2025 06:47:40 -0700

Hi, I'm seeing an increase in the number of QR code spam that isn't being
caught. I'm not even sure it's being checked using zbarimg. Here's what I
have in ExtractText.cf:


extracttext_external    zbar            /usr/bin/zbarimg -D {}
extracttext_use         zbar            .jpg .png .pdf .webp
image/(?:jpeg|png) application/pdf
add_header              all             ExtractText-Uris _EXTRACTTEXTURIS_

Here's an example of the encoded PDF in an email that appears not to have
been scanned. Should I add "application/octet-stream" to the
extracttext_use line in addition to the others?

--===============5303414978067341145==

Content-Type: application/octet-stream
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="Telecommuting_Policy_2025-07-01.pdf"

JVBERi0xLjQKMSAwIG9iago8PAovVGl0bGUgKP7/AFIAZQBtAG8AdABlACAAVwBvAHIAawAgAFAA
bwBsAGkAYwB5ACAAfAAgAEkAbgB0AGUAcgBuAGEAbAAgAEgAUgAgAFAAbwByAHQAYQBsKQovQ3Jl
YXRvciAo/v8AdwBrAGgAdABtAGwAdABvAHAAZABmACAAMAAuADEAMgAuADYpCi9Qcm9kdWNlciAo

When I run zbarimg on the saved PDF directly, it does reveal the QR-code
link within the PDF.

Also, it's very slow because it has to spawn the binary with every request.
Is there a way to load it into memory or use a library version to avoid
having to do this every time? Sometimes salespeople send emails to 50+
people at a time with a legitimate PDF, but it has to spawn zbarimg for
each of them, nevertheless, so it could eventually be a denial-of-service.

ExtractText and zbarimg

Reply via email to