http://ad-publications.informatik.uni-freiburg.de/benchmark.pdf
A Benchmark and Evaluation for Text Extraction from PDF
PDFBox is the best in 4 categories, the worst in one (missing newlines),
and near the top in one (lack of errors). I have asked the authors to
name me some of the files re: missing newlines, and the two error files.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]