Hi Manuel, Thank you for the fast response, I will investigate Tabula.
Regards, Dane Dane Bezuidenhout SprintHive <https://sprinthive.com/> M: +27 82 562 7850 vCard <http://www.sprinthive.com/files/dane.vcf> On Tue, Jul 18, 2017 at 5:31 PM, Manuel Aristarán <[email protected]> wrote: > Hi Dane, > > As you might know, there's no thing such as tables in PDF files. The only > way to extract them is to try to reconstruct the tabular arrangement from > the characters' positions, ruling lines, and so on. I'm one of the > maintainers of Tabula [1], which is a tool based on PDFBox that implements > a number of algorithms to attempt that. We have a GUI tool [2], and a Java > library [3]. Both are open source (MIT license) > > Best, > > [1] http://tabula.technology > [2] https://github.com/tabulapdf/tabula > [3] https://github.com/tabulapdf/tabula-java > > -- > Manuel Aristarán > jazzido.com > > > > On Tue, Jul 18, 2017 at 9:28 AM, Dane Bezuidenhout < > [email protected]> wrote: > > > The examples available are clear on constructing a table, but there is > > little info on reading a table. I've investigated a few solution to this, > > but feel that they are "hacky" in that they rely on establishing column > and > > row regions to read text from. > > > > Surely there is a canonical way to traverse the PDDocument table elements > > and access table cells with reference to row and columns? > > > > Any advice would be appreciated. > > > > > > Dane Bezuidenhout > > SprintHive <https://sprinthive.com/> > > > > M: +27 82 562 7850 > > > > > > vCard <http://www.sprinthive.com/files/dane.vcf> > > >

