Re: Help with removing images from a PDF

Andreas Lehmkuehler Mon, 15 Oct 2012 23:11:59 -0700

hi,


Am 15.10.2012 03:56, schrieb Nicholas Tiong:

Hi Andreas,

I've commented out the 'do' line, but still cannot get rid of the images.

I've basically opened the document and loaded the resources and then saved
the document. See code below.

This seems to be insufficient. Do I need to parse the PDF stream somehow?

Ups, I guess there was a misunderstanding. My idea won't work if you want toremove the images permanently. But I have another one, see below

Regards,
Nicholas Tiong

import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.exceptions.CryptographyException;
import org.apache.pdfbox.exceptions.InvalidPasswordException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.resources.*;
import java.io.IOException;

public class ExtractImages {
     public static void main(String[] argv) throws COSVisitorException,
InvalidPasswordException, CryptographyException, IOException {
         PDDocument document = PDDocument.load("input.pdf");

         if (document.isEncrypted()) {
             document.decrypt("");
         }

         PDDocumentCatalog catalog = document.getDocumentCatalog();
         for (Object pageObj :  catalog.getAllPages()) {
             PDPage page = (PDPage) pageObj;
             PDResources resources = page.findResources();

You have to remove all images from the dictionary. I neither test nor compilethat code, but it should make clear how it works.



        COSDictionary dictResources = resources.getCOSDictionary();
        HashMap<String,PDXObjectImage> images = resources.getImages();
        Iterator<String> iter = images.keySet().iterator();
        while( iter.hastNext() )
        {
                dictResources.removeItem(COSName.getPDFName(iter.next()));
        }



         }

         document.save("strippedOfImages.pdf");
     }
}


SNIP


We should probably add a removeItem-method to the PDResources class.

BR
Andreas Lehmkühler

Re: Help with removing images from a PDF

Reply via email to