http://ad-publications.informatik.uni-freiburg.de/benchmark.pdf

A Benchmark and Evaluation for Text Extraction from PDF

PDFBox is the best in 4 categories, the worst in one (missing newlines), and near the top in one (lack of errors). I have asked the authors to name me some of the files re: missing newlines, and the two error files.

Tilman



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to