What, if at all possible, is the preferred way to determine if a document (namely a pdf) is of "binary nature"?
I am extracting text of many pdf user manuals for lucene indexing and some of them deliver "absurd binary terms", which I would like to omit Thx Clemens