Hi all,
I found a solution for embedding back images in the PDF document,
replacing old image streams.
Quick notes about the process :
- I embed images extracted from the PDF with
PDXObjectImage.write2file(). Files are edited in place, though I'm
sure that the file format will be the same as the embedded stream ;
- When I edit files in place, I do not change the file configuration
(width, height, etc are the same) ;
- There is no document saving between extracting and embedding phases.
Relating to the example bellow, those notes are important because :
- I copy the file raw stream and do not update the stream dictionnary ;
- I use the COSObject number to identify each stream and saving the
PDF may change those numbers.
It may help someone else though it may not be the best solution.
/**
* Embed back every bitmap image in the document found on the specified
* directory.
* @param doc Document to extract images from
* @param dir Destination path to save images to
* @throws Exception
*/
public static void embedImages (PDDocument doc, File dir)
throws Exception {
if( dir.exists() ) {
if( !dir.isDirectory() ) {
dir = new File( dir.getCanonicalPath()+"-img" );
embedImages( doc, dir );
return;
}
}
else {
dir.mkdirs();
}
Iterator<Entry<COSObjectKey, Integer>> xrefEntriesIt =
doc.getDocument().getXrefTable().entrySet().iterator();
while( xrefEntriesIt.hasNext() ) {
COSObject object = doc.getDocument().getObjectFromPool(
xrefEntriesIt.next().getKey() );
if( object.getDictionaryObject( COSName.SUBTYPE ) ==
COSName.IMAGE )
embedSingleImage( object, dir );
}
}
/**
* Extracts an image pointed as a COSObject in the specified directory.
* The image may be a vectorial path wich is not handled yet. This is
* guessed by the imageMask flag. However there may be better
indicators.
* IMPORTANT NOTICE: The file stream is directly embedded in the old
stream.
* If the image size changes, the final display will show distortion.
* @param imObj The COSObject referencing the image stream
* @param dir The directory where image is to be extracted to
* @throws Exception
*/
protected static void embedSingleImage( COSObject imObj, File dir )
throws Exception {
PDXObjectImage im = (PDXObjectImage) PDXObject.createXObject(
(COSStream) imObj.getObject() );
if( im.getImageMask() ) return;
File inFile = new File( dir.getCanonicalPath()+File.separator
+imObj.getObjectNumber().intValue()+"."+im.getSuffix() );
if( !inFile.exists() )
throw new Exception( "The file `"+inFile.getCanonicalPath()
+"` doesn't exist and cannot be embedded." );
InputStream newStream = new FileInputStream( inFile );
OutputStream embeddedStream =
im.getCOSStream().createFilteredStream();
int bSize = 10240;
byte[] b = new byte[bSize];
int bytesRead = 0;
while( ( bytesRead = newStream.read( b, 0, bSize ) ) > -1 )
embeddedStream.write( b, 0, bytesRead );
embeddedStream.close();
}
Julien PLÉE
Le 26 août 10 à 00:47, [email protected] a écrit :
Julien,
Doesn't this code[1] create a new image object which is in no way
attached
to the PDF? Modifying "(COSStream) obj.getObject()" seems like it'd
do
what you intend. I'm not familiar with PDXObject.createXObject(),
but it
seems like that'd be creating a copy of the data passed it (similar
to a
copy constructor). Obviously modifying a copy isn't going to affect
the
original.
I'm pretty sure that's your problem, but I've never done anything with
streams nor images in PDFs, so I'm afraid I don't know the way to it's
supposed to be done.
Another thing which might be important: some PDF programs don't
write out
anything in the xref table. This doesn't follow the spec, but Adobe
Reader opens them fine either way, so many people don't realize
they're
out of spec (and thus expect your code to process them the same as a
proper PDF).
[1] PDXObjectImage image = (PDXObjectImage)
PDXObject.createXObject((COSStream) obj.getObject() );
----
Thanks,
Adam
From:
Julien Plée <[email protected]>
To:
[email protected]
Date:
08/25/2010 15:00
Subject:
Replacing images contents
Hello,
I have to put a watermark stamp on images stored in PDF files and I'm
having hard times trying to embed images back into the PDF.
I'm using the XrefTable to filter images. For embedding, I'm trying to
replace the stream of the original object but with no luck, the saved
PDF always looks the same.
Here is my method code focused on the PDXObjectImage :
/**
* Replaces a PDF image content with content from an image file on
file
* system identified by the object id.
*
* (this.doc : PDDocument)
* @param obj
* @throws IOException
*/
protected void embedImageBack(COSObject obj) throws IOException
{
String path = "img/";
PDXObjectImage image = (PDXObjectImage)
PDXObject.createXObject(
(COSStream)
obj.getObject() );
File inputFile = new File(
path+obj.getObjectNumber()+"."+image.getSuffix() );
PDXObjectImage newImage = null;
if (image.getSuffix().equals("jpg"))
newImage = new PDJpeg( this.doc, new
FileInputStream(inputFile) );
else
newImage = new PDCcitt( this.doc,
(RandomAccess) new RandomAccessFile( inputFile, "r" ) );
image.getCOSStream().replaceWithStream(newImage.getCOSStream());
this.shouldSaveDoc = true;
}
After all images have been processed, I save the document in a new
file, but except that the file size changes, nothing else visible
happens.
Thanks for any help.
Julien PLÉE
? Click here to submit conditions
This email and any content within or attached hereto from Sun West
Mortgage Company, Inc. is confidential and/or legally privileged.
The information is intended only for the use of the individual or
entity named on this email. If you are not the intended recipient,
you are hereby notified that any disclosure, copying, distribution
or the taking of any action in reliance on the contents of this
email information is strictly prohibited, and that the documents
should be returned to this office immediately by email. Receipt by
anyone other than the intended recipient is not a waiver of any
privilege. Please do not include your social security number,
account number, or any other personal or financial information in
the content of the email. Should you have any questions, please
call (800) 453 7884.