Re: Help with removing images from a PDF

Andreas Lehmkühler Tue, 16 Oct 2012 00:40:52 -0700

There is one small but important change, see below ....

Andreas Lehmkuehler <[email protected]> hat am 16. Oktober 2012 um 08:10
geschrieben:
> hi,
>
>
> Am 15.10.2012 03:56, schrieb Nicholas Tiong:
> > Hi Andreas,
> >
> > I've commented out the 'do' line, but still cannot get rid of the images.
> >
> > I've basically opened the document and loaded the resources and then saved
> > the document. See code below.
> >
> > This seems to be insufficient. Do I need to parse the PDF stream somehow?
> Ups, I guess there was a misunderstanding. My idea won't work if you want to
> remove the images permanently. But I have another one, see below
>
> > Regards,
> > Nicholas Tiong
> >
> > import org.apache.pdfbox.exceptions.COSVisitorException;
> > import org.apache.pdfbox.exceptions.CryptographyException;
> > import org.apache.pdfbox.exceptions.InvalidPasswordException;
> > import org.apache.pdfbox.pdmodel.PDDocument;
> > import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
> > import org.apache.pdfbox.pdmodel.PDPage;
> > import org.apache.pdfbox.pdmodel.PDResources;
> > import org.apache.pdfbox.resources.*;
> > import java.io.IOException;
> >
> > public class ExtractImages {
> >      public static void main(String[] argv) throws COSVisitorException,
> > InvalidPasswordException, CryptographyException, IOException {
> >          PDDocument document = PDDocument.load("input.pdf");
> >
> >          if (document.isEncrypted()) {
> >              document.decrypt("");
> >          }
> >
> >          PDDocumentCatalog catalog = document.getDocumentCatalog();
> >          for (Object pageObj :  catalog.getAllPages()) {
> >              PDPage page = (PDPage) pageObj;
> >              PDResources resources = page.findResources();
>
> You have to remove all images from the dictionary. I neither test nor compile
> that code, but it should make clear how it works.
>
>
>    COSDictionary dictResources = resources.getCOSDictionary();


// get the XObject dictionary
COSDictionary dictResources = (COSDictionary)resources.getDictionaryObject(
COSName.XOBJECT )

>    HashMap<String,PDXObjectImage> images = resources.getImages();
>    Iterator<String> iter = images.keySet().iterator();
>    while( iter.hastNext() )
>    {
>            dictResources.removeItem(COSName.getPDFName(iter.next()));
>    }
>
> >
> >
> >          }
> >
> >          document.save("strippedOfImages.pdf");
> >      }
> > }
> >
> >
> > SNIP
>
> We should probably add a removeItem-method to the PDResources class.
>
> BR
> Andreas Lehmkühler

Re: Help with removing images from a PDF

Reply via email to