hi,

Am 15.10.2012 03:56, schrieb Nicholas Tiong:
Hi Andreas,

I've commented out the 'do' line, but still cannot get rid of the images.

I've basically opened the document and loaded the resources and then saved
the document. See code below.

This seems to be insufficient. Do I need to parse the PDF stream somehow?
Ups, I guess there was a misunderstanding. My idea won't work if you want to remove the images permanently. But I have another one, see below

Regards,
Nicholas Tiong

import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.exceptions.CryptographyException;
import org.apache.pdfbox.exceptions.InvalidPasswordException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.resources.*;
import java.io.IOException;

public class ExtractImages {
     public static void main(String[] argv) throws COSVisitorException,
InvalidPasswordException, CryptographyException, IOException {
         PDDocument document = PDDocument.load("input.pdf");

         if (document.isEncrypted()) {
             document.decrypt("");
         }

         PDDocumentCatalog catalog = document.getDocumentCatalog();
         for (Object pageObj :  catalog.getAllPages()) {
             PDPage page = (PDPage) pageObj;
             PDResources resources = page.findResources();

You have to remove all images from the dictionary. I neither test nor compile that code, but it should make clear how it works.


        COSDictionary dictResources = resources.getCOSDictionary();
        HashMap<String,PDXObjectImage> images = resources.getImages();
        Iterator<String> iter = images.keySet().iterator();
        while( iter.hastNext() )
        {
                dictResources.removeItem(COSName.getPDFName(iter.next()));
        }



         }

         document.save("strippedOfImages.pdf");
     }
}


SNIP

We should probably add a removeItem-method to the PDResources class.

BR
Andreas Lehmkühler

Reply via email to