Am 15.07.2017 um 13:22 schrieb Tilman Hausherr:
http://ad-publications.informatik.uni-freiburg.de/benchmark.pdf

A Benchmark and Evaluation for Text Extraction from PDF
Interesting, some details I've already found:

- they used 2.0.3
- the said itext is similar to PDFBox (page 7 upper right) ;-)

Andreas


PDFBox is the best in 4 categories, the worst in one (missing newlines), and near the top in one (lack of errors). I have asked the authors to name me some of the files re: missing newlines, and the two error files.

Tilman



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to