Hi, I'm seeing an increase in the number of QR code spam that isn't being
caught. I'm not even sure it's being checked using zbarimg. Here's what I
have in ExtractText.cf:
extracttext_external zbar /usr/bin/zbarimg -D {}
extracttext_use zbar .jpg .png .pdf .webp
image/(?:jpeg|png) application/pdf
add_header all ExtractText-Uris _EXTRACTTEXTURIS_
Here's an example of the encoded PDF in an email that appears not to have
been scanned. Should I add "application/octet-stream" to the
extracttext_use line in addition to the others?
--===============5303414978067341145==
Content-Type: application/octet-stream
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="Telecommuting_Policy_2025-07-01.pdf"
JVBERi0xLjQKMSAwIG9iago8PAovVGl0bGUgKP7/AFIAZQBtAG8AdABlACAAVwBvAHIAawAgAFAA
bwBsAGkAYwB5ACAAfAAgAEkAbgB0AGUAcgBuAGEAbAAgAEgAUgAgAFAAbwByAHQAYQBsKQovQ3Jl
YXRvciAo/v8AdwBrAGgAdABtAGwAdABvAHAAZABmACAAMAAuADEAMgAuADYpCi9Qcm9kdWNlciAo
When I run zbarimg on the saved PDF directly, it does reveal the QR-code
link within the PDF.
Also, it's very slow because it has to spawn the binary with every request.
Is there a way to load it into memory or use a library version to avoid
having to do this every time? Sometimes salespeople send emails to 50+
people at a time with a legitimate PDF, but it has to spawn zbarimg for
each of them, nevertheless, so it could eventually be a denial-of-service.