Hi,
Please ignore my post/mail from this morning. I've found why the
exception you mention could happen and have opened an issue here:
https://github.com/dragon66/icafe/issues/63
Re the PDF content:
The first question is whether your PDF files are properly converted to
an image. To verify this, save each image into a png file with
ImageIO.write(). Either it is good or not. I expect that it is. If not,
please upload the PDF files to a sharehoster.
You shouldn't edit in PDF files unless you know what you're doing. What
you did with this file can result in weird effects on the fonts.
This imaging tool should be able to work on your image files regardless
of the content.
Re: converting into multipage TIFF files, see this "minority" answer by me:
https://stackoverflow.com/a/31974376/535646
Re: converting images to bitonal, try also this library which I use at
work for a purpose similar to yours:
http://www.jhlabs.com/ip/filters/index.html
I start with color and then use a combination of filters to get a b/w image.
Tilman
PS: Current PDFBox version is 2.0.7.
Am 25.10.2017 um 11:45 schrieb Platthaus, Thomas:
Just to give you a short update.
Maybe we have found the reason ...
In original pdf there are font parts like this:
/Subtype /TrueType
/BaseFont /ArialUnicodeMS-Regular
or
/Subtype /TrueType
/BaseFont /GMOZQV+ArialUnicodeMS-Regular
As the error occurs on unix system we patched the pdf and changed
above parts to something like this:
/Subtype /Type1
/BaseFont /Helvetica
As a result the pdf to tiff conversion is working but outgoing tiff is
ugly in some parts ...
So it seems that during conversion the fonts are read/used somehow.
2017-10-25 9:54 GMT+02:00 Platthaus, Thomas
<[email protected]
<mailto:[email protected]>>:
Thx for quick response.
I am sure that the marked line is the correct one as the trace
shows the line where IndexOutOfBounds happens:
<Trace> at com.icafe4j.image.tiff.TIFFTweaker.writeMultipageTIFF
Line 3154</Trace>
And I checked again that there are 3 pages in the document. By the
way, the same functionality is called for other pdf files with
only one page in document and it works fine.
Maybe members from icafe has another idea what is causing the issue!?
2017-10-25 9:33 GMT+02:00 Tilman Hausherr <[email protected]
<mailto:[email protected]>>:
Looks like a bug in icafe. I suspect the exception is one line
below the red line, list.get(0) won't work if your PDF has
only 1 page because "list" would be empty.
(cc to your two in case you didn't subscribe properly; please
answer to list only)
Tilman
Am 25.10.2017 um 08:02 schrieb Platthaus, Thomas:
Dear dragon66,
Dear pdfbox-Team,
we have a problem (txt file with exception attached) using
icafe4j lib under SAP PI 7.4 (running on Unix-System).
At first I would like to describe our project in a short form:
We have many PI-interfaces getting invoices from different
customers all over the world. Most of the invoices are
sent as xml structure. When we have received the xml the
first step is to convert it to an internal standard format.
This message in standard format is the basic data for
building a pdf invoice using pdfbox (currently v2.0.4).
For some of the interfaces the next step is to create a
multi-page tiff file for archiving the invoice, here we use the
lib from icafe4j (v1.1). It runs fine and we do not have any
problems with the interfaces that are already productive.
We have some interfaces where we get a pdf file as incoming
invoice (e.g. via mail, sftp). We are converting these
pdf files to multi-page tiff, too. A couple of these are
running in production environment for many weeks now.
But one of the interfaces (just in development) now leads to
an exception while transforming incoming pdf to tiff.
The functionality is fine when testing in local environment
(eclipse, Java 1.6, Windows 7) but fails in SAP PI.
At the beginning of the project we had a similar behaviour
using the standard ImageIO classes. All fine during
local tests but exceptions in SAP PI: Some font problem and
ImageIO.getImageReadersByFormatName and
ImageIO.scanForPlugins didn't work although we tried many
different ways to implement. As a conclusion we
then decided to use icafe4j and the problem has been solved -
till we got another error now :-(
So back to our current failure. In PI we call the java class
BuildAndSaveTIFFByContainer (excerpts from code):
....
// Load PDF File to PDDocument.
*pDoc* = PDDocument.load(bPdf); //
*org.apache.pdfbox.pdmodel.PDDocument*
oTrace.addDebugMessage("BuildAndSaveTIFFByContainer: Write
Stream to PDDocument.");
pageCounter = pDoc.getNumberOfPages();
this.setDynamicConfiguration("http://covestro.com/COV/X01-AP-INVOICE-BROKER
<http://covestro.com/COV/X01-AP-INVOICE-BROKER>",
"PageCounter", "" + pageCounter);
// Create TIFF from PDF
CreateMultiTIFFFromPDF tiffFromPdf = new
CreateMultiTIFFFromPDF();
bTiff = tiffFromPdf.*createMultipageTIFF(pDoc)*;
// Close document
pDoc.close();
....
// Method createMultipageTIFF() from class CreateMultiTIFFFromPDF
public byte[]*createMultipageTIFF(PDDocument pddPdf)* throws
IOException {
pdDocument = pddPdf;
byte[] retByteArray = null;
int dpi = 300;
PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
BufferedImage[] images = new
BufferedImage[pdDocument.getNumberOfPages()];
TIFFOptions tiffOptions = new TIFFOptions();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
RandomAccessOutputStream rout = new
FileCacheRandomAccessOutputStream(baos);
for (int pageIdx = 0; pageIdx <
pdDocument.getNumberOfPages(); pageIdx++) {
try {
BufferedImage imageG =
pdfRenderer.renderImageWithDPI(pageIdx, dpi, ImageType.GRAY);
BufferedImage imageBB = new
BufferedImage(imageG.getWidth(), imageG.getHeight(),
BufferedImage.TYPE_BYTE_BINARY);
Graphics2D g2d = imageBB.createGraphics();
g2d.drawRenderedImage(imageG, null);
g2d.dispose();
images[pageIdx] = imageBB;
} catch (IOException e) {
e.printStackTrace();
}
}
ImageParam.ImageParamBuilder builder =
ImageParam.getBuilder();
ImageParam[] param = new ImageParam[1];
tiffOptions.setTiffCompression(Compression.CCITTFAX4);
tiffOptions.setXResolution(dpi);
tiffOptions.setYResolution(dpi);
builder.imageOptions(tiffOptions);
builder.colorType(ImageColorType.BILEVEL);
param[0] = builder.build();
*TIFFTweaker.writeMultipageTIFF(rout, param, images);*
retByteArray = baos.toByteArray();
// Close Document
pdDocument.close();
rout.close();
baos.close();
return retByteArray;
}
The red marked line is getting IndexOutOfBoundsException
(details in attached PDF_TO_TIFF_ERR.txt).
We checked and compared the content of all known pdf files
for our invoice interfaces (incoming or created by pdfbox)
and the only difference we see is the following:
There are some barcodes coming with pdf files (screenshot
below) and in screenshot above you can see Filter [
/ASCII85Decode /FlateDecode ] ... this is the first time we
got this
ASCII85Decode. But we are not sure if this is the reason for
our problem!? And unfortunately debugging in PI is very
difficult ...
Do you have any idea what could be the reason for
IndexOutOfBounds here? Or maybe you know something about
similar issues and their solution?
Screenshot TIFFTweaker class from icafe4j 1.1:
I would welcome your response.
Thank you!
--
*Best regards / Mit freundlichen Grüßen
Thomas Platthaus**
EAI & SOA Senior Developer***
________________________________
Rhein-Ruhr-Informatik GmbH
Alexanderstraße 50
45472 Mülheim an der Ruhr
Tel +49 208 452358-0
Mob +49 173 2928075 <tel:+49%20173%202928075>
Fax +49 208 452358-10 <tel:+49%20208%2045235810>
Email [email protected]
<mailto:[email protected]>
Web http://www.rhein-ruhr-informatik.de
<http://www.realcore-it.de/>
Rhein-Ruhr-Informatik GmbH
Amtsgericht Essen: HRB 24657
Sitz der Gesellschaft: Essen
Geschäftsführer: Michael Heim
---------------------------------------------------------------------
To unsubscribe, e-mail:[email protected]
<mailto:[email protected]>
For additional commands, e-mail:[email protected]
<mailto:[email protected]>
--
*Best regards / Mit freundlichen Grüßen
Thomas Platthaus**
EAI & SOA Senior Developer***
________________________________
Rhein-Ruhr-Informatik GmbH
Alexanderstraße 50
45472 Mülheim an der Ruhr
Tel +49 208 452358-0
Mob +49 173 2928075 <tel:+49%20173%202928075>
Fax +49 208 452358-10 <tel:+49%20208%2045235810>
Email [email protected]
<mailto:[email protected]>
Web http://www.rhein-ruhr-informatik.de <http://www.realcore-it.de/>
Rhein-Ruhr-Informatik GmbH
Amtsgericht Essen: HRB 24657
Sitz der Gesellschaft: Essen
Geschäftsführer: Michael Heim
--
*Best regards / Mit freundlichen Grüßen
Thomas Platthaus**
EAI & SOA Senior Developer***
________________________________
Rhein-Ruhr-Informatik GmbH
Alexanderstraße 50
45472 Mülheim an der Ruhr
Tel +49 208 452358-0
Mob +49 173 2928075
Fax +49 208 452358-10
Email [email protected]
<mailto:[email protected]>
Web http://www.rhein-ruhr-informatik.de <http://www.realcore-it.de/>
Rhein-Ruhr-Informatik GmbH
Amtsgericht Essen: HRB 24657
Sitz der Gesellschaft: Essen
Geschäftsführer: Michael Heim