hi,
Am 15.10.2012 03:56, schrieb Nicholas Tiong:
Hi Andreas,
I've commented out the 'do' line, but still cannot get rid of the images.
I've basically opened the document and loaded the resources and then saved
the document. See code below.
This seems to be insufficient. Do I need to parse the PDF stream somehow?
Ups, I guess there was a misunderstanding. My idea won't work if you want to
remove the images permanently. But I have another one, see below
Regards,
Nicholas Tiong
import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.exceptions.CryptographyException;
import org.apache.pdfbox.exceptions.InvalidPasswordException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.resources.*;
import java.io.IOException;
public class ExtractImages {
public static void main(String[] argv) throws COSVisitorException,
InvalidPasswordException, CryptographyException, IOException {
PDDocument document = PDDocument.load("input.pdf");
if (document.isEncrypted()) {
document.decrypt("");
}
PDDocumentCatalog catalog = document.getDocumentCatalog();
for (Object pageObj : catalog.getAllPages()) {
PDPage page = (PDPage) pageObj;
PDResources resources = page.findResources();
You have to remove all images from the dictionary. I neither test nor compile
that code, but it should make clear how it works.
COSDictionary dictResources = resources.getCOSDictionary();
HashMap<String,PDXObjectImage> images = resources.getImages();
Iterator<String> iter = images.keySet().iterator();
while( iter.hastNext() )
{
dictResources.removeItem(COSName.getPDFName(iter.next()));
}
}
document.save("strippedOfImages.pdf");
}
}
SNIP
We should probably add a removeItem-method to the PDResources class.
BR
Andreas Lehmkühler