PDFBox images extraction.

Silviu-Ilie Arsene-Isbasescu Tue, 12 Apr 2011 07:03:33 -0700

Hello, I would like to use PDFBox to export the images in a PDF file and to
extract the text with layout.
As I could see in the code the postion, font and fontsize are available so,
if I write a PDF2XML class this is possible. Please correct me if I'm wrong!
Also I would like to extract all the types of images ( CITT Fax, JPEG,
JPEG2000, JPX ). Is that possible. On a CITTFax image I've got the WARNING:
"getRGBImage returned NULL". I've seen in the sources that JPX/JPEG2000
image decoding is not available right now.
Please confirm and if possible send a list/link to a list with all the
features or at least what version of PDF is PDFToImage and ExtractText are
supporting.



Thank you!

PDFBox images extraction.

Reply via email to