If you have any recommendations for the more general case, let us know on TIKA-1443 [1].
[1] https://issues.apache.org/jira/browse/TIKA-1443 -----Original Message----- From: Wouter De Borger [mailto:wouter.debor...@inmanta.com] Sent: Thursday, March 30, 2017 6:00 AM To: users@pdfbox.apache.org Subject: Make PDFBox fail on bad pdf Hi All, When a pdf has bad encoding, PDFBox produces garbage (as explained in the FAQ https://pdfbox.apache.org/2.0/faq.html#gibberish). Can I make PDFBox fail in this case instead of producing garbage? (I'm working on a system that can also do OCR, so at the least sign of trouble, I would like PDF box to fail and try OCR.) Thanks, Wouter