Re: how to extract page titles

Andreas Lehmkuehler Thu, 23 Aug 2012 09:15:41 -0700

Hi,


Am 20.08.2012 19:32, schrieb Jagadeesh N. Malakannavar:

Hi,

I am looking for a techniques to extract page titles. For example, if PDF
has chapter1, chapter2 .... I want to list  chapter1, chapter2.
I may convert to few pages text and few others to html format conditionally.

A PDF doesn't know anything about the structure of the text. There is no conceptof markup, like chapters, heading, footer etc.

Maybe it would be possible to detect some special parts of a text using a moreless intelligent algo, but PDFBox doesn't provide such functionality.



BR
Andreas Lehmkühler

Re: how to extract page titles

Reply via email to