This is normally completely dependent on heuristics. I estimate that for
scientific documents alone hundreds of person-years have been spent trying
to decode PDF stream into tables. There is not, and will not be a universal
solution.



On Tue, Sep 30, 2014 at 7:57 AM, Borris Bonafort <[email protected]>
wrote:

> Hi ,
>       How to identify table using PDFBOX . And extract text from it .
> Please help me with the idea .
>
> Thanks
>  Borris
>



-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

Reply via email to