Hi,

Am 20.08.2012 19:32, schrieb Jagadeesh N. Malakannavar:
Hi,

I am looking for a techniques to extract page titles. For example, if PDF
has chapter1, chapter2 .... I want to list  chapter1, chapter2.
I may convert to few pages text and few others to html format conditionally.
A PDF doesn't know anything about the structure of the text. There is no concept of markup, like chapters, heading, footer etc.

Maybe it would be possible to detect some special parts of a text using a more less intelligent algo, but PDFBox doesn't provide such functionality.


BR
Andreas Lehmkühler

Reply via email to