There is one small but important change, see below ....
Andreas Lehmkuehler <[email protected]> hat am 16. Oktober 2012 um 08:10
geschrieben:
> hi,
>
>
> Am 15.10.2012 03:56, schrieb Nicholas Tiong:
> > Hi Andreas,
> >
> > I've commented out the 'do' line, but still cannot get rid of the images.
> >
> > I've basically opened the document and loaded the resources and then saved
> > the document. See code below.
> >
> > This seems to be insufficient. Do I need to parse the PDF stream somehow?
> Ups, I guess there was a misunderstanding. My idea won't work if you want to
> remove the images permanently. But I have another one, see below
>
> > Regards,
> > Nicholas Tiong
> >
> > import org.apache.pdfbox.exceptions.COSVisitorException;
> > import org.apache.pdfbox.exceptions.CryptographyException;
> > import org.apache.pdfbox.exceptions.InvalidPasswordException;
> > import org.apache.pdfbox.pdmodel.PDDocument;
> > import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
> > import org.apache.pdfbox.pdmodel.PDPage;
> > import org.apache.pdfbox.pdmodel.PDResources;
> > import org.apache.pdfbox.resources.*;
> > import java.io.IOException;
> >
> > public class ExtractImages {
> > public static void main(String[] argv) throws COSVisitorException,
> > InvalidPasswordException, CryptographyException, IOException {
> > PDDocument document = PDDocument.load("input.pdf");
> >
> > if (document.isEncrypted()) {
> > document.decrypt("");
> > }
> >
> > PDDocumentCatalog catalog = document.getDocumentCatalog();
> > for (Object pageObj : catalog.getAllPages()) {
> > PDPage page = (PDPage) pageObj;
> > PDResources resources = page.findResources();
>
> You have to remove all images from the dictionary. I neither test nor compile
> that code, but it should make clear how it works.
>
>
> COSDictionary dictResources = resources.getCOSDictionary();
// get the XObject dictionary
COSDictionary dictResources = (COSDictionary)resources.getDictionaryObject(
COSName.XOBJECT )
> HashMap<String,PDXObjectImage> images = resources.getImages();
> Iterator<String> iter = images.keySet().iterator();
> while( iter.hastNext() )
> {
> dictResources.removeItem(COSName.getPDFName(iter.next()));
> }
>
> >
> >
> > }
> >
> > document.save("strippedOfImages.pdf");
> > }
> > }
> >
> >
> > SNIP
>
> We should probably add a removeItem-method to the PDResources class.
>
> BR
> Andreas Lehmkühler