Re: AW: PDF contains any text?

Andreas Lehmkühler Fri, 05 Feb 2010 01:37:25 -0800

Hi,

Gesendet: Fr, 05. Feb 2010 Von: Roeder, Andreas<[email protected]>


> Erik,
> 
> I wrote the following code:
> 
> public boolean containsText() throws IOException {
>               
>       PDDocument document = null;
>       try {
>               document = PDDocument.load(pdfFile);                    
>               PDFTextStripperByArea stripper = new PDFTextStripperByArea();
>                       
>               List<PDPage> pages = 
> document.getDocumentCatalog().getAllPages();
>                       
>               for(PDPage page : pages) {
>                       PDRectangle rectancle = page.getTrimBox();
>                       Rectangle2D.Float awtRect = new Rectangle2D.Float(
> rectancle.getLowerLeftX(),rectancle.getUpperRightY(),
>                                                                               
>                                                                               
>                                   rectancle.getWidth(), 
> rectancle.getHeight());
>                       stripper.addRegion(page.toString(), awtRect);
>                       stripper.extractRegions(page);
>                       for(Object regionObj : stripper.getRegions()) {
>                               String regionName = (String) regionObj;
>                               String text = 
> stripper.getTextForRegion(regionName);
>                               if(text != null && text.length() > 0) {
>                                       return true;
>                               } 
>                       }
>               }
>               return false;
>       } finally {
>               document.close();
>       }
> }
> 
> 
> But unfortunately the line:
> 
>       String text = stripper.getTextForRegion(regionName);
> 
> always returns an empty String, what am I doing wrong? 
I didn't test your source in detail, but I think you should use 

new 
Rectangle2D.Float(rectancle.getLowerLeftX(),rectancle.getLowerLeftY(),rectancle.getWidth(),
 rectancle.getHeight());

to create the area you are looking for.


> Best Regards,
> 
> Andreas

BR
Andreas Lehmkühler

> -----Ursprüngliche Nachricht-----
> Von: Erik Scholtz, ArgonSoft GmbH [mailto:[email protected]] 
> Gesendet: Mittwoch, 3. Februar 2010 17:03
> An: [email protected]
> Betreff: Re: PDF contains any text?
> 
> 
> Andreas,
> 
> without parsing the content of a document and telling about its contents 
>   sounds to me like you are looking for the 
> PDDocument.oracle_of_delphi() method :)
> 
> But to answer your question: No - you have to look at the resources of 
> each page whether there are text-resources or not, to find out about 
> that. There is no "central resource_available dictionary" in PDF.
> 
> 
> Best regards,
> Erik
> 
> Roeder, Andreas wrote:
> > Hi,
> > 
> > Is there a way to find out if a PDF contains any text without parsing 
> > the whole document? Some PDF contain just images.
> > 
> > Best Regards,
> > 
> > Andreas
> > 
> 
> 

--- original Nachricht Ende ----

Re: AW: PDF contains any text?

Reply via email to