AW: PDF contains any text?

Roeder, Andreas Wed, 03 Feb 2010 22:59:51 -0800

Dear Hesham,

Thank you very much for your response!


The purpose of my question is: I need to find out if all fonts used inside the 
PDF are embedded. But if a PDF only contains images and no text, I don't need 
to check for embedded fonts. At the moment I'm doing that:

        public boolean containsText(String pdfFile) throws IOException {
                PDDocument document = PDDocument.load(pdfFile);
                PDFTextStripper stripper = new PDFTextStripper();
                String text = stripper.getText(document);
                if(text != null && text.length() > 0) {
                        return true;
                } else {
                        return false;
                }
        }

But if the document is very large, this method can take a while. As soon some 
text is found I could already return true. But I couldn't figure out how to do 
that.

Best Regards,

Andreas


-----Ursprüngliche Nachricht-----
Von: Hesham G. [mailto:[email protected]] 
Gesendet: Donnerstag, 4. Februar 2010 07:47
An: [email protected]
Betreff: Re: PDF contains any text?


I remember there was somehow in PDFBox to read some resources from the PDF 
and skip others, I don't remember how but I think there's some way to skip 
parsing images in the PDF.

Best regards ,
Hesham
--------------------------------------------------
From: "Erik Scholtz, ArgonSoft GmbH" <[email protected]>
Sent: Wednesday, February 03, 2010 6:03 PM
To: <[email protected]>
Subject: Re: PDF contains any text?

> Andreas,
>
> without parsing the content of a document and telling about its 
> contents
> sounds to me like you are looking for the PDDocument.oracle_of_delphi() 
> method :)
>
> But to answer your question: No - you have to look at the resources of
> each page whether there are text-resources or not, to find out about that. 
> There is no "central resource_available dictionary" in PDF.
>
>
> Best regards,
> Erik
>
> Roeder, Andreas wrote:
>> Hi,
>>
>> Is there a way to find out if a PDF contains any text without parsing 
>> the
>> whole document?
>> Some PDF contain just images.
>>
>> Best Regards,
>>
>> Andreas
>>
>

AW: PDF contains any text?

Reply via email to