Am 15.07.2017 um 13:22 schrieb Tilman Hausherr:
http://ad-publications.informatik.uni-freiburg.de/benchmark.pdf
A Benchmark and Evaluation for Text Extraction from PDF
Interesting, some details I've already found:
- they used 2.0.3
- the said itext is similar to PDFBox (page 7 upper right) ;-)
Andreas
PDFBox is the best in 4 categories, the worst in one (missing newlines), and
near the top in one (lack of errors). I have asked the authors to name me some
of the files re: missing newlines, and the two error files.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]