On Wed, Feb 6, 2013 at 10:24 AM, kulbhushan singh <[email protected] > wrote:
> Hi Andreas, > > I did the adobe test and it gives me the same junk characters as pdfbox. I > also tried to "save as text.." but result is same. In pdf properties I > found that encoding is Identity-H. I googled this encoding and fond that > many others also have the same problem. > Identity-H is a problem. We will probably have to interpret the glyph P. > > In my pdf I am even not able to search any text. Is OCR and Glyph my only > option to extract text out of it? Or is there and other way to go on this. > > Regards, Kulbhushan > -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

