Re: how to extract page titles

Duane Nickull Sun, 26 Aug 2012 10:39:33 -0700

A title is not an item that can be deterministically accessed with
accuracy IMO.  A best guess based on font size and positioning may be as
good as is possible.

We are running into the same issue with form captions.  It all depends on
how the author marks up the original documents.  We (technoracle) have
done some good work in this area with predictive analysis.

Duane Nickull 
***********************************
Technoracle Advanced Systems Inc.
Consulting and Contracting; Proven Results!
i.  Neo4J, PDF, Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile
b. http://technoracle.blogspot.com
t.  @duanechaos
"Don't fear the Graph!  Embrace Neo4J"

On 2012-08-20 10:32 AM, "Jagadeesh N. Malakannavar"
<[email protected]> wrote:

>Hi,
>
>I am looking for a techniques to extract page titles. For example, if PDF
>has chapter1, chapter2 .... I want to list  chapter1, chapter2.
>I may convert to few pages text and few others to html format
>conditionally.
>
>-- 
>
>Thanks,
>Jagadeesh N.Malakannavar

Re: how to extract page titles

Reply via email to