A title is not an item that can be deterministically accessed with accuracy IMO. A best guess based on font size and positioning may be as good as is possible.
We are running into the same issue with form captions. It all depends on how the author marks up the original documents. We (technoracle) have done some good work in this area with predictive analysis. Duane Nickull *********************************** Technoracle Advanced Systems Inc. Consulting and Contracting; Proven Results! i. Neo4J, PDF, Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile b. http://technoracle.blogspot.com t. @duanechaos "Don't fear the Graph! Embrace Neo4J" On 2012-08-20 10:32 AM, "Jagadeesh N. Malakannavar" <[email protected]> wrote: >Hi, > >I am looking for a techniques to extract page titles. For example, if PDF >has chapter1, chapter2 .... I want to list chapter1, chapter2. >I may convert to few pages text and few others to html format >conditionally. > >-- > >Thanks, >Jagadeesh N.Malakannavar

