Hi,
Am 20.08.2012 19:32, schrieb Jagadeesh N. Malakannavar:
Hi,
I am looking for a techniques to extract page titles. For example, if PDF
has chapter1, chapter2 .... I want to list chapter1, chapter2.
I may convert to few pages text and few others to html format conditionally.
A PDF doesn't know anything about the structure of the text. There is no concept
of markup, like chapters, heading, footer etc.
Maybe it would be possible to detect some special parts of a text using a more
less intelligent algo, but PDFBox doesn't provide such functionality.
BR
Andreas Lehmkühler