All, I'm new to PDFBox and I'm having some trouble converting the pages in a PDF to JPEG images. The issue seems to be more than similar to the linked thread.
http://markmail.org/message/em3rrqz24ip3c7s5 I have used ImageMagick CLI tools to ensure that the PDF is not the issue and everything renders just fine. Below, I have linked two images of the same page (one created by image magick CLI and one by PDFBox). http://www.sitesoftllc.net/images/imagemagick.jpg (ImageMagick Created) http://www.sitesoftllc.net/images/pdfbox.jpg (PDFBox Created) My next attempt to determine if I was doing something wrong was to extract the images in the PDF. Those appear to be discolored as well (I have linked the result below). http://www.sitesoftllc.net/images/extractedimage.jpg (Extracted Image) Has anybody experienced this? Is there a solution? Please, find my code copied below. I appreciate any help in finding out what I'm doing wrong. package com.sitesoftllc.pdf; import java.awt.image.BufferedImage; import java.io.BufferedInputStream; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.util.Iterator; import java.util.List; import java.util.Map; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.pdmodel.PDResources; import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage; import sun.awt.image.codec.JPEGImageEncoderImpl; import com.sun.image.codec.jpeg.JPEGCodec; import com.sun.image.codec.jpeg.JPEGImageEncoder; public class PdfBoxPdfReader { public static void main(String[] args){ try{ BufferedInputStream bis = new BufferedInputStream(new FileInputStream(new File("files/january.pdf"))); PDDocument doc = PDDocument.load(bis); List pages = (List)doc.getDocumentCatalog().getAllPages(); int pageCount = doc.getNumberOfPages(); System.out.println(pageCount); int pageNumber = 0; Iterator it = pages.iterator(); while(it.hasNext()){ String imageType = "jpg"; String fileName = "files/pages/page-"+pageNumber+"."+imageType; PDPage thisPage = (PDPage)it.next(); BufferedImage image = thisPage.convertToImage(); FileOutputStream pageFos = new FileOutputStream(fileName); PDResources resources = thisPage.getResources(); Map images = resources.getImages(); Iterator imageIt = images.keySet().iterator(); while(imageIt.hasNext()){ String imageKey = (String) imageIt.next(); PDXObjectImage pdfImage = (PDXObjectImage)images.get(imageKey); BufferedImage bImage = pdfImage.getRGBImage(); FileOutputStream imageFos = new FileOutputStream(new File("files/extracted/"+imageKey+".jpg")); JPEGImageEncoder jpgEncoder = JPEGCodec.createJPEGEncoder(imageFos); jpgEncoder.encode(bImage); System.out.println(imageKey); } JPEGImageEncoder jpgEncoder = JPEGCodec.createJPEGEncoder(pageFos); jpgEncoder.encode(image); pageNumber++; } } catch (Exception e){ System.err.println(e.getMessage()); e.printStackTrace(); } } private static JPEGImageEncoderImpl JPEGImageEncoderImpl() { // TODO Auto-generated method stub return null; } } Dustin Clifford SiteSoft L.L.C. 8063 20th. St. Jenison, MI 49428 e. [email protected] p. (616) 901-8693 f. (616) 667-9622

