Hi,

Am 08.04.2013 10:14, schrieb Alexander Klenner:
Hi Maruan,

thank you, I now do have a first clue what is happening, as you suggested I 
used the command line with the ExtractImages command, which leads to many 
Images, those are actually the same, that I see on my created convertToImage() 
pages.

Using the ExtractText method from the cml, I get all the text from this PDF.
So somehow convertToImage() for this particular PDF seems to only return the results from 
"ExtractImages".
I also tried PDFToImage using the nonSeq parameter, this method returns exactly 
the semi-empty pages that my java code produces.

So I conclude for some PDFs convertToImage() returns text+images for some it 
only returns images. Is this the expected behaviour?

All PDFs I process have 'real' text, which is selectable and that is not 
covered by an ImageLayer of text of some sort (at least I think so).

I uploaded the PDF and the output of PDFToImage to 
https://www.dropbox.com/sh/inkcdahx4da1kzp/13bnj-BrZt
I ran a quick test and I can confirm the described behaviour. There aren't any
exceptions or other obvious issues. It looks like the embedded type1 fonts are
somehow problematic. But for now I don't have any clue why.

Cheers,

Alex

BR
Andreas Lehmkühler

Reply via email to