Eliminating super scripts while extracting text from pdf

Siva Kumar Ch Fri, 28 Mar 2014 11:35:17 -0700

Hi,

I am trying to extract text from pdf, and process the text. I have been
successful in extraction, but could not get much benefits out of it as the
extracted text treated the superscripts, usually numbers, as normal text.


A superscript to a word, which is the last word of a sentence, has been
placed after the period(.)

ex: Word: "test" with superscript "super"
When it appeared at the end of a sentence, has been extracted as -
"test.super"

Is there any way I can get rid of superscripts?

-- 
Br,
Siva.

Eliminating super scripts while extracting text from pdf

Reply via email to