Hello, I would like to use PDFBox to export the images in a PDF file and to
extract the text with layout.
As I could see in the code the postion, font and fontsize are available so,
if I write a PDF2XML class this is possible. Please correct me if I'm wrong!
Also I would like to extract all the types of images ( CITT Fax, JPEG,
JPEG2000, JPX ). Is that possible. On a CITTFax image I've got the WARNING:
"getRGBImage returned NULL". I've seen in the sources that JPX/JPEG2000
image decoding is not available right now.
Please confirm and if possible send a list/link to a list with all the
features or at least what version of PDF is PDFToImage and ExtractText are
supporting.


Thank you!

Reply via email to