Usage PDMarkedContentExtractor

Fill Freeman Fri, 05 Sep 2014 06:18:58 -0700

Hello.
I'm newbie in PDFBox, and I have a question.
As I understand, there can be a kind of html-markup in PDF file. Of course
not all PDF files use it, but anyway. Is it possible to use
PDMarkedContentExtractor class to extract some marked content?


For example: I have a PDF file with a table. I use the iTextRups utility to
browse a structure of a PDF file. I see that there is a Table node with TR
and TD child nodes. As I understand, elements of the structure use MCID
markers. And PDMarkedContentExtractor should use it to "extract" text
marked with specified MCID. Am I right? If it is true, could somebody show
me some simple example of it's usage, because I have no Idea how it should
work.

Usage PDMarkedContentExtractor

Reply via email to