This is normally completely dependent on heuristics. I estimate that for scientific documents alone hundreds of person-years have been spent trying to decode PDF stream into tables. There is not, and will not be a universal solution.
On Tue, Sep 30, 2014 at 7:57 AM, Borris Bonafort <[email protected]> wrote: > Hi , > How to identify table using PDFBOX . And extract text from it . > Please help me with the idea . > > Thanks > Borris > -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

