Dear Hesham,
Thank you very much for your response!
The purpose of my question is: I need to find out if all fonts used inside the
PDF are embedded. But if a PDF only contains images and no text, I don't need
to check for embedded fonts. At the moment I'm doing that:
public boolean containsText(String pdfFile) throws IOException {
PDDocument document = PDDocument.load(pdfFile);
PDFTextStripper stripper = new PDFTextStripper();
String text = stripper.getText(document);
if(text != null && text.length() > 0) {
return true;
} else {
return false;
}
}
But if the document is very large, this method can take a while. As soon some
text is found I could already return true. But I couldn't figure out how to do
that.
Best Regards,
Andreas
-----Ursprüngliche Nachricht-----
Von: Hesham G. [mailto:[email protected]]
Gesendet: Donnerstag, 4. Februar 2010 07:47
An: [email protected]
Betreff: Re: PDF contains any text?
I remember there was somehow in PDFBox to read some resources from the PDF
and skip others, I don't remember how but I think there's some way to skip
parsing images in the PDF.
Best regards ,
Hesham
--------------------------------------------------
From: "Erik Scholtz, ArgonSoft GmbH" <[email protected]>
Sent: Wednesday, February 03, 2010 6:03 PM
To: <[email protected]>
Subject: Re: PDF contains any text?
> Andreas,
>
> without parsing the content of a document and telling about its
> contents
> sounds to me like you are looking for the PDDocument.oracle_of_delphi()
> method :)
>
> But to answer your question: No - you have to look at the resources of
> each page whether there are text-resources or not, to find out about that.
> There is no "central resource_available dictionary" in PDF.
>
>
> Best regards,
> Erik
>
> Roeder, Andreas wrote:
>> Hi,
>>
>> Is there a way to find out if a PDF contains any text without parsing
>> the
>> whole document?
>> Some PDF contain just images.
>>
>> Best Regards,
>>
>> Andreas
>>
>