Handling Graphics from Scanned PDF

Eliot Kimber Thu, 06 Dec 2012 09:49:29 -0800

I am trying to find QR codes on PDFs that are scanned page images. My code
works fine for scans produced by my OfficeJet and for page images produced
out of Acrobat but scans produced by my client's eCopy ShareScan device
(according to the PDF metadata) are not usable.


Looking into the PDF data stream, each page is represented by two images, a
"bg" image that is what I would expect for the page image, but very faint
grey, and a "fg" image that reflects the page content but with lots of grey
and ghosting.

The PDF renderer must be combining these two images in some way to provide
the clear image I see in Acrobat.

Is there something I can find in the PDF data stream that will tell me how
these images are combined and, if so, can anyone point me in the right
direction for processing these images? I am pretty new to Java image
processing so I'm not sure where to look or what to look for.

The images themselves are repored by PDFBox as PDJpeg objects.

I can provide a sample PDF page if it's needed.

Thanks,

Eliot

-- 
Eliot Kimber
Senior Solutions Architect, RSI Content Solutions
"Bringing Strategy, Content, and Technology Together"
Main: 512.554.9368
www.rsicms.com
www.rsuitecms.com
Book: DITA For Practitioners, from XML Press,
http://xmlpress.net/publications/dita/practitioners-1/

Handling Graphics from Scanned PDF

Reply via email to