Hi Paulo, > On Dec 28, 2016, at 9:52 AM, [email protected] wrote: > > Unfortunately, Tabula uses a totally different approach (image analysis) > [...]
Sorry for going (sort of) off-topic, but that's not correct. In fact, Tabula does not support images. Thanks to PDFBox, it "mines" text and graphical elements, and uses a set of heuristics that attempt reconstruct a tabular structure. > Tabula also do incoherent analysis when a table is larger than one page, for > that reason Tabula is far from being a good tool for text extraction with > correct positioning. We always welcome bug reports (and patches!) :) [1] Thanks! [1] https://github.com/tabulapdf/tabula-java/issues — Manuel Aristarán <[email protected]> http://jazzido.com --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

