Re: Identify not visible characters - Overlapped characters

Manuel Aristarán Wed, 28 Dec 2016 12:37:14 -0800

Hi Paulo,

> On Dec 28, 2016, at 9:52 AM, [email protected] wrote:
> 
> Unfortunately, Tabula uses a totally different approach (image analysis)
> [...]


Sorry for going (sort of) off-topic, but that's not correct. In fact, Tabula 
does not support images. Thanks to PDFBox, it "mines" text and graphical 
elements, and uses a set of heuristics that attempt reconstruct a tabular 
structure.

> Tabula also do incoherent analysis when a table is larger than one page, for
> that reason Tabula is far from being a good tool for text extraction with
> correct positioning.

We always welcome bug reports (and patches!) :) [1]

Thanks!

[1] https://github.com/tabulapdf/tabula-java/issues


—
Manuel Aristarán <[email protected]>
http://jazzido.com




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Identify not visible characters - Overlapped characters

Reply via email to