Hi all,

I found a solution for embedding back images in the PDF document, replacing old image streams.
Quick notes about the process :
- I embed images extracted from the PDF with PDXObjectImage.write2file(). Files are edited in place, though I'm sure that the file format will be the same as the embedded stream ; - When I edit files in place, I do not change the file configuration (width, height, etc are the same) ;
- There is no document saving between extracting and embedding phases.
Relating to the example bellow, those notes are important because :
- I copy the file raw stream and do not update the stream dictionnary ;
- I use the COSObject number to identify each stream and saving the PDF may change those numbers.

It may help someone else though it may not be the best solution.

/**
 * Embed back every bitmap image in the document found on the specified
 * directory.
 * @param doc Document to extract images from
 * @param dir Destination path to save images to
 * @throws Exception
 */
public static void embedImages (PDDocument doc, File dir)
throws Exception {
    if( dir.exists() ) {
        if( !dir.isDirectory() ) {
            dir = new File( dir.getCanonicalPath()+"-img" );
            embedImages( doc, dir );
        return;
        }
    }
    else {
        dir.mkdirs();
    }
    Iterator<Entry<COSObjectKey, Integer>> xrefEntriesIt =
        doc.getDocument().getXrefTable().entrySet().iterator();
    while( xrefEntriesIt.hasNext() ) {
        COSObject object = doc.getDocument().getObjectFromPool(
                xrefEntriesIt.next().getKey() );
if( object.getDictionaryObject( COSName.SUBTYPE ) == COSName.IMAGE )
            embedSingleImage( object, dir );
    }
}

/**
 * Extracts an image pointed as a COSObject in the specified directory.
 * The image may be a vectorial path wich is not handled yet. This is
* guessed by the imageMask flag. However there may be better indicators. * IMPORTANT NOTICE: The file stream is directly embedded in the old stream.
 * If the image size changes, the final display will show distortion.
 * @param imObj The COSObject referencing the image stream
 * @param dir The directory where image is to be extracted to
 * @throws Exception
 */
protected static void embedSingleImage( COSObject imObj, File dir )
throws Exception {
    PDXObjectImage im = (PDXObjectImage) PDXObject.createXObject(
            (COSStream) imObj.getObject() );
    if( im.getImageMask() ) return;
    File inFile = new File( dir.getCanonicalPath()+File.separator
            +imObj.getObjectNumber().intValue()+"."+im.getSuffix() );
    if( !inFile.exists() )
        throw new Exception( "The file `"+inFile.getCanonicalPath()
        +"` doesn't exist and cannot be embedded." );
    InputStream newStream = new FileInputStream( inFile );
OutputStream embeddedStream = im.getCOSStream().createFilteredStream();
    int bSize = 10240;
    byte[] b = new byte[bSize];
    int bytesRead = 0;
    while( ( bytesRead = newStream.read( b, 0, bSize ) ) > -1 )
        embeddedStream.write( b, 0, bytesRead );
    embeddedStream.close();
}


Julien PLÉE

Le 26 août 10 à 00:47, [email protected] a écrit :

Julien,

Doesn't this code[1] create a new image object which is in no way attached to the PDF? Modifying "(COSStream) obj.getObject()" seems like it'd do what you intend. I'm not familiar with PDXObject.createXObject(), but it seems like that'd be creating a copy of the data passed it (similar to a copy constructor). Obviously modifying a copy isn't going to affect the
original.

I'm pretty sure that's your problem, but I've never done anything with
streams nor images in PDFs, so I'm afraid I don't know the way to it's
supposed to be done.

Another thing which might be important: some PDF programs don't write out
anything in the xref table.  This doesn't follow the spec, but Adobe
Reader opens them fine either way, so many people don't realize they're
out of spec (and thus expect your code to process them the same as a
proper PDF).

[1] PDXObjectImage image = (PDXObjectImage)
PDXObject.createXObject((COSStream) obj.getObject() );

----
Thanks,
Adam





From:
Julien Plée <[email protected]>
To:
[email protected]
Date:
08/25/2010 15:00
Subject:
Replacing images contents



Hello,

I have to put a watermark stamp on images stored in PDF files and I'm
having hard times trying to embed images back into the PDF.
I'm using the XrefTable to filter images. For embedding, I'm trying to
replace the stream of the original object but with no luck, the saved
PDF always looks the same.
Here is my method code focused on the PDXObjectImage :

/**
* Replaces a PDF image content with content from an image file on file
 * system identified by the object id.
 *
 * (this.doc : PDDocument)
 * @param obj
 * @throws IOException
 */
protected void embedImageBack(COSObject obj) throws IOException
{
                String path = "img/";
                PDXObjectImage image = (PDXObjectImage)
PDXObject.createXObject(
                                                (COSStream)
obj.getObject() );
                File inputFile = new File(
path+obj.getObjectNumber()+"."+image.getSuffix() );
                PDXObjectImage newImage = null;
                if (image.getSuffix().equals("jpg"))
                                newImage = new PDJpeg( this.doc, new
FileInputStream(inputFile) );
                else
                                newImage = new PDCcitt( this.doc,
(RandomAccess) new RandomAccessFile( inputFile, "r" ) );
image.getCOSStream().replaceWithStream(newImage.getCOSStream());
                this.shouldSaveDoc = true;
}

After all images have been processed, I save the document in a new
file, but except that the file size changes, nothing else visible
happens.
Thanks for any help.

Julien PLÉE


?  Click here to submit conditions

This email and any content within or attached hereto from Sun West Mortgage Company, Inc. is confidential and/or legally privileged. The information is intended only for the use of the individual or entity named on this email. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or the taking of any action in reliance on the contents of this email information is strictly prohibited, and that the documents should be returned to this office immediately by email. Receipt by anyone other than the intended recipient is not a waiver of any privilege. Please do not include your social security number, account number, or any other personal or financial information in the content of the email. Should you have any questions, please call (800) 453 7884.

Reply via email to