Erik,
I wrote the following code:
public boolean containsText() throws IOException {
PDDocument document = null;
try {
document = PDDocument.load(pdfFile);
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
List<PDPage> pages =
document.getDocumentCatalog().getAllPages();
for(PDPage page : pages) {
PDRectangle rectancle = page.getTrimBox();
Rectangle2D.Float awtRect = new Rectangle2D.Float(
rectancle.getLowerLeftX(),rectancle.getUpperRightY(),
rectancle.getWidth(), rectancle.getHeight());
stripper.addRegion(page.toString(), awtRect);
stripper.extractRegions(page);
for(Object regionObj : stripper.getRegions()) {
String regionName = (String) regionObj;
String text =
stripper.getTextForRegion(regionName);
if(text != null && text.length() > 0) {
return true;
}
}
}
return false;
} finally {
document.close();
}
}
But unfortunately the line:
String text = stripper.getTextForRegion(regionName);
always returns an empty String, what am I doing wrong?
Best Regards,
Andreas
-----Ursprüngliche Nachricht-----
Von: Erik Scholtz, ArgonSoft GmbH [mailto:[email protected]]
Gesendet: Mittwoch, 3. Februar 2010 17:03
An: [email protected]
Betreff: Re: PDF contains any text?
Andreas,
without parsing the content of a document and telling about its contents
sounds to me like you are looking for the
PDDocument.oracle_of_delphi() method :)
But to answer your question: No - you have to look at the resources of
each page whether there are text-resources or not, to find out about
that. There is no "central resource_available dictionary" in PDF.
Best regards,
Erik
Roeder, Andreas wrote:
> Hi,
>
> Is there a way to find out if a PDF contains any text without parsing
> the whole document? Some PDF contain just images.
>
> Best Regards,
>
> Andreas
>