All,

I'm new to PDFBox and I'm having some trouble converting the pages in a PDF to 
JPEG images. The issue seems to be more than similar to the linked thread.

 http://markmail.org/message/em3rrqz24ip3c7s5

I have used ImageMagick CLI tools to ensure that the PDF is not the issue and 
everything renders just fine. Below, I have linked two images of the same page 
(one created by image magick CLI and one by PDFBox).

 http://www.sitesoftllc.net/images/imagemagick.jpg    (ImageMagick Created)
 http://www.sitesoftllc.net/images/pdfbox.jpg         (PDFBox Created)

My next attempt to determine if I was doing something wrong was to extract the 
images in the PDF. Those appear to be discolored as well (I have linked the 
result below). 

 http://www.sitesoftllc.net/images/extractedimage.jpg (Extracted Image)

Has anybody experienced this? Is there a solution? Please, find my code copied 
below. I appreciate any help in finding out what I'm doing wrong.

package com.sitesoftllc.pdf;

import java.awt.image.BufferedImage;
import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.Iterator;
import java.util.List;
import java.util.Map;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage;

import sun.awt.image.codec.JPEGImageEncoderImpl;

import com.sun.image.codec.jpeg.JPEGCodec;
import com.sun.image.codec.jpeg.JPEGImageEncoder;

public class PdfBoxPdfReader {

        public static void main(String[] args){
                try{
                        BufferedInputStream bis = new BufferedInputStream(new 
FileInputStream(new File("files/january.pdf")));
                        PDDocument doc = PDDocument.load(bis);
                        List pages = 
(List)doc.getDocumentCatalog().getAllPages();

                        int pageCount = doc.getNumberOfPages();

                        System.out.println(pageCount);

                        int pageNumber = 0;
                        Iterator it = pages.iterator();
                        while(it.hasNext()){
                                String imageType = "jpg";
                                String fileName = 
"files/pages/page-"+pageNumber+"."+imageType;
                                PDPage thisPage = (PDPage)it.next();
                                BufferedImage image = thisPage.convertToImage();
                                FileOutputStream pageFos = new 
FileOutputStream(fileName);
                                
                                PDResources resources = thisPage.getResources();
                                Map images = resources.getImages();
                                Iterator imageIt = images.keySet().iterator();
                                while(imageIt.hasNext()){
                                        String imageKey = (String) 
imageIt.next();
                                        PDXObjectImage pdfImage = 
(PDXObjectImage)images.get(imageKey);
                                        BufferedImage bImage = 
pdfImage.getRGBImage();
                                        FileOutputStream imageFos = new 
FileOutputStream(new File("files/extracted/"+imageKey+".jpg"));
                                        
                                        JPEGImageEncoder jpgEncoder = 
JPEGCodec.createJPEGEncoder(imageFos);
                                        jpgEncoder.encode(bImage);
                                        
                                        System.out.println(imageKey);
                                }
                                        
                                JPEGImageEncoder jpgEncoder = 
JPEGCodec.createJPEGEncoder(pageFos);
                                jpgEncoder.encode(image);
                                
                                pageNumber++;
                        }
                } catch (Exception e){
                        System.err.println(e.getMessage());
                        e.printStackTrace();
                }
        }

        private static JPEGImageEncoderImpl JPEGImageEncoderImpl() {
                // TODO Auto-generated method stub
                return null;
        }

}




Dustin Clifford 
SiteSoft L.L.C. 
8063 20th. St. 
Jenison, MI 49428 

e. [email protected] 
p. (616) 901-8693 
f. (616) 667-9622 


Reply via email to