> I guess that everyone knows that the support for CID-code fonts and unicode > mappings > is one of the needed keyfeatures. But it's not that easy to implement, > especially if you are > not able to read those (asian) texts. ;-) It is one of the issues on my > "want-todo-list" ....
Yeah, fair enough. I can't read any asian languages either, so I know the feeling :) Is there some way to have PDFBox indicate that it can't handle a PDF correctly? Like, is there some function that verifies there aren't any CID fonts, or is there a method that throws exceptions when trying to read things that it can't handle? I'd like to use PDFBox for the PDFs that it can handle, and then fall back to pdftotext for the ones that it can't.

