Hi,
Gesendet: Fr, 05. Feb 2010 Von: Roeder, Andreas<[email protected]>
> Erik,
>
> I wrote the following code:
>
> public boolean containsText() throws IOException {
>
> PDDocument document = null;
> try {
> document = PDDocument.load(pdfFile);
> PDFTextStripperByArea stripper = new PDFTextStripperByArea();
>
> List<PDPage> pages =
> document.getDocumentCatalog().getAllPages();
>
> for(PDPage page : pages) {
> PDRectangle rectancle = page.getTrimBox();
> Rectangle2D.Float awtRect = new Rectangle2D.Float(
> rectancle.getLowerLeftX(),rectancle.getUpperRightY(),
>
>
> rectancle.getWidth(),
> rectancle.getHeight());
> stripper.addRegion(page.toString(), awtRect);
> stripper.extractRegions(page);
> for(Object regionObj : stripper.getRegions()) {
> String regionName = (String) regionObj;
> String text =
> stripper.getTextForRegion(regionName);
> if(text != null && text.length() > 0) {
> return true;
> }
> }
> }
> return false;
> } finally {
> document.close();
> }
> }
>
>
> But unfortunately the line:
>
> String text = stripper.getTextForRegion(regionName);
>
> always returns an empty String, what am I doing wrong?
I didn't test your source in detail, but I think you should use
new
Rectangle2D.Float(rectancle.getLowerLeftX(),rectancle.getLowerLeftY(),rectancle.getWidth(),
rectancle.getHeight());
to create the area you are looking for.
> Best Regards,
>
> Andreas
BR
Andreas Lehmkühler
> -----Ursprüngliche Nachricht-----
> Von: Erik Scholtz, ArgonSoft GmbH [mailto:[email protected]]
> Gesendet: Mittwoch, 3. Februar 2010 17:03
> An: [email protected]
> Betreff: Re: PDF contains any text?
>
>
> Andreas,
>
> without parsing the content of a document and telling about its contents
> sounds to me like you are looking for the
> PDDocument.oracle_of_delphi() method :)
>
> But to answer your question: No - you have to look at the resources of
> each page whether there are text-resources or not, to find out about
> that. There is no "central resource_available dictionary" in PDF.
>
>
> Best regards,
> Erik
>
> Roeder, Andreas wrote:
> > Hi,
> >
> > Is there a way to find out if a PDF contains any text without parsing
> > the whole document? Some PDF contain just images.
> >
> > Best Regards,
> >
> > Andreas
> >
>
>
--- original Nachricht Ende ----