Extract underlying PDF code from PDF file by selecting an area

Stefan Falk Wed, 14 Jan 2015 12:43:46 -0800

Hello pdfbox people!

I was wondering if anybody can help me with my needs. What I am lookingfor is a possibility to extract the underlying PDF code from a PDF fileby simply selecting an area with your mouse.

After reading a few things about PDFs I have learned that anything thathas to do with extraction anything from a PDF can be a quite hard task.

So I was wondering if pdfbox could do that somehow. I've taken a roughlook at the PDFReader and I noticed that there is e.g.processTextPosition from the class PageDrawer that seem to allow me toget at least the position from Text - am I right in assuming that?

My concrete question would be what is possible with pdfbox regardingthis matter? E.g. I have a PDF on my drive which text seems to be"extractable" by pdfbox on the one hand but on the other hand thePDFReader is not able to render any of it. It just renders the images(see attachment).


Thank you for your help in advance!

Best regards,
Stefan

Extract underlying PDF code from PDF file by selecting an area

Reply via email to