Hi,

Please ignore my post/mail from this morning. I've found why the exception you mention could happen and have opened an issue here:
https://github.com/dragon66/icafe/issues/63

Re the PDF content:
The first question is whether your PDF files are properly converted to an image. To verify this, save each image into a png file with ImageIO.write(). Either it is good or not. I expect that it is. If not, please upload the PDF files to a sharehoster.

You shouldn't edit in PDF files unless you know what you're doing. What you did with this file can result in weird effects on the fonts.

This imaging tool should be able to work on your image files regardless of the content.

Re: converting into multipage TIFF files, see this "minority" answer by me:
https://stackoverflow.com/a/31974376/535646

Re: converting images to bitonal, try also this library which I use at work for a purpose similar to yours:
http://www.jhlabs.com/ip/filters/index.html
I start with color and then use a combination of filters to get a b/w image.

Tilman

PS: Current PDFBox version is 2.0.7.


Am 25.10.2017 um 11:45 schrieb Platthaus, Thomas:
Just to give you a short update.

Maybe we have found the reason ...

In original pdf there are font parts like this:

/Subtype /TrueType
/BaseFont /ArialUnicodeMS-Regular
or
/Subtype /TrueType
/BaseFont /GMOZQV+ArialUnicodeMS-Regular

As the error occurs on unix system we patched the pdf and changed above parts to something like this:

/Subtype /Type1
/BaseFont /Helvetica

As a result the pdf to tiff conversion is working but outgoing tiff is ugly in some parts ...

So it seems that during conversion the fonts are read/used somehow.


2017-10-25 9:54 GMT+02:00 Platthaus, Thomas <thomas.platth...@rhein-ruhr-informatik.de <mailto:thomas.platth...@rhein-ruhr-informatik.de>>:

    Thx for quick response.

    I am sure that the marked line is the correct one as the trace
    shows the line where IndexOutOfBounds happens:
    <Trace> at com.icafe4j.image.tiff.TIFFTweaker.writeMultipageTIFF
    Line 3154</Trace>

    And I checked again that there are 3 pages in the document. By the
    way, the same functionality is called for other pdf files with
    only one page in document and it works fine.

    Maybe members from icafe has another idea what is causing the issue!?

    2017-10-25 9:33 GMT+02:00 Tilman Hausherr <thaush...@t-online.de
    <mailto:thaush...@t-online.de>>:

        Looks like a bug in icafe. I suspect the exception is one line
        below the red line, list.get(0) won't work if your PDF has
        only 1 page because "list" would be empty.

        (cc to your two in case you didn't subscribe properly; please
        answer to list only)

        Tilman

        Am 25.10.2017 um 08:02 schrieb Platthaus, Thomas:
        Dear dragon66,
        Dear pdfbox-Team,

        we have a problem (txt file with exception attached) using
        icafe4j lib under SAP PI 7.4 (running on Unix-System).

        At first I would like to describe our project in a short form:
        We have many PI-interfaces getting invoices from different
        customers all over the world. Most of the invoices are
        sent as xml structure. When we have received the xml the
        first step is to convert it to an internal standard format.
        This message in standard format is the basic data for
        building a pdf invoice using pdfbox (currently v2.0.4).
        For some of the interfaces the next step is to create a
        multi-page tiff file for archiving the invoice, here we use the
        lib from icafe4j (v1.1). It runs fine and we do not have any
        problems with the interfaces that are already productive.

        We have some interfaces where we get a pdf file as incoming
        invoice (e.g. via mail, sftp). We are converting these
        pdf files to multi-page tiff, too. A couple of these are
        running in production environment for many weeks now.

        But one of the interfaces (just in development) now leads to
        an exception while transforming incoming pdf to tiff.
        The functionality is fine when testing in local environment
        (eclipse, Java 1.6, Windows 7) but fails in SAP PI.

        At the beginning of the project we had a similar behaviour
        using the standard ImageIO classes. All fine during
        local tests but exceptions in SAP PI: Some font problem and
        ImageIO.getImageReadersByFormatName and
        ImageIO.scanForPlugins didn't work although we tried many
        different ways to implement. As a conclusion we
        then decided to use icafe4j and the problem has been solved -
        till we got another error now :-(

        So back to our current failure. In PI we call the java class
        BuildAndSaveTIFFByContainer (excerpts from code):
        ....
        // Load PDF File to PDDocument.
        *pDoc* = PDDocument.load(bPdf);                     //
        *org.apache.pdfbox.pdmodel.PDDocument*
        oTrace.addDebugMessage("BuildAndSaveTIFFByContainer: Write
        Stream to PDDocument.");
        pageCounter = pDoc.getNumberOfPages();
        
this.setDynamicConfiguration("http://covestro.com/COV/X01-AP-INVOICE-BROKER
        <http://covestro.com/COV/X01-AP-INVOICE-BROKER>",
        "PageCounter", "" + pageCounter);
        // Create TIFF from PDF
        CreateMultiTIFFFromPDF tiffFromPdf = new
        CreateMultiTIFFFromPDF();
        bTiff = tiffFromPdf.*createMultipageTIFF(pDoc)*;
        // Close document
        pDoc.close();
        ....

        // Method createMultipageTIFF() from class CreateMultiTIFFFromPDF
        public byte[]*createMultipageTIFF(PDDocument pddPdf)* throws
        IOException {
            pdDocument = pddPdf;
            byte[] retByteArray = null;
            int dpi = 300;
            PDFRenderer pdfRenderer = new PDFRenderer(pdDocument);
            BufferedImage[] images = new
        BufferedImage[pdDocument.getNumberOfPages()];
            TIFFOptions tiffOptions = new TIFFOptions();
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            RandomAccessOutputStream rout = new
        FileCacheRandomAccessOutputStream(baos);
            for (int pageIdx = 0; pageIdx <
        pdDocument.getNumberOfPages(); pageIdx++) {
                try {
                    BufferedImage imageG =
        pdfRenderer.renderImageWithDPI(pageIdx, dpi, ImageType.GRAY);
                    BufferedImage imageBB = new
        BufferedImage(imageG.getWidth(), imageG.getHeight(),
        BufferedImage.TYPE_BYTE_BINARY);
                    Graphics2D g2d = imageBB.createGraphics();
                    g2d.drawRenderedImage(imageG, null);
                    g2d.dispose();
                    images[pageIdx] = imageBB;
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
            ImageParam.ImageParamBuilder builder =
        ImageParam.getBuilder();
            ImageParam[] param = new ImageParam[1];
            tiffOptions.setTiffCompression(Compression.CCITTFAX4);
            tiffOptions.setXResolution(dpi);
            tiffOptions.setYResolution(dpi);
            builder.imageOptions(tiffOptions);
            builder.colorType(ImageColorType.BILEVEL);
            param[0] = builder.build();
        *TIFFTweaker.writeMultipageTIFF(rout, param, images);*
            retByteArray = baos.toByteArray();
            // Close Document
            pdDocument.close();
            rout.close();
            baos.close();
            return retByteArray;
        }

        The red marked line is getting IndexOutOfBoundsException
        (details in attached PDF_TO_TIFF_ERR.txt).

        We checked and compared the content of all known pdf files
        for our invoice interfaces (incoming or created by pdfbox)
        and the only difference we see is the following:


        ​
        ​There are some barcodes coming with pdf files (screenshot
        below) and in screenshot above you can see Filter [
        /ASCII85Decode /FlateDecode ] ... this is the first time we
        got this
        ASCII85Decode. But we are not sure if this is the reason for
        our problem!? And unfortunately debugging in PI is very
        difficult ...


        ​

        Do you have any idea what could be the reason for
        IndexOutOfBounds here? Or maybe you know something about
        similar issues and their solution?

        Screenshot TIFFTweaker class from icafe4j 1.1:


        ​

        I would welcome your response.
        Thank you!

--
        *Best regards / Mit freundlichen Grüßen
        Thomas Platthaus**
        EAI & SOA Senior Developer***
        ________________________________


        Rhein-Ruhr-Informatik GmbH
        Alexanderstraße 50
        45472 Mülheim an der Ruhr

        Tel +49 208 452358-0
        Mob +49 173 2928075 <tel:+49%20173%202928075>
        Fax +49 208 452358-10 <tel:+49%20208%2045235810>
        Email thomas.platth...@rhein-ruhr-informatik.de
        <mailto:thomas.platth...@realcore-it.de>
        Web http://www.rhein-ruhr-informatik.de
        <http://www.realcore-it.de/>

        Rhein-Ruhr-Informatik GmbH
        Amtsgericht Essen: HRB 24657
        Sitz der Gesellschaft: Essen
        Geschäftsführer: Michael Heim


        ---------------------------------------------------------------------
        To unsubscribe, e-mail:users-unsubscr...@pdfbox.apache.org
        <mailto:users-unsubscr...@pdfbox.apache.org>
        For additional commands, e-mail:users-h...@pdfbox.apache.org
        <mailto:users-h...@pdfbox.apache.org>





--
    *Best regards / Mit freundlichen Grüßen
    Thomas Platthaus**
    EAI & SOA Senior Developer***
    ________________________________


    Rhein-Ruhr-Informatik GmbH
    Alexanderstraße 50
    45472 Mülheim an der Ruhr

    Tel +49 208 452358-0
    Mob +49 173 2928075 <tel:+49%20173%202928075>
    Fax +49 208 452358-10 <tel:+49%20208%2045235810>
    Email thomas.platth...@rhein-ruhr-informatik.de
    <mailto:thomas.platth...@realcore-it.de>
    Web http://www.rhein-ruhr-informatik.de <http://www.realcore-it.de/>

    Rhein-Ruhr-Informatik GmbH
    Amtsgericht Essen: HRB 24657
    Sitz der Gesellschaft: Essen
    Geschäftsführer: Michael Heim




--

*Best regards / Mit freundlichen Grüßen
Thomas Platthaus**
EAI & SOA Senior Developer***
________________________________


Rhein-Ruhr-Informatik GmbH
Alexanderstraße 50
45472 Mülheim an der Ruhr

Tel +49 208 452358-0
Mob +49 173 2928075
Fax +49 208 452358-10
Email thomas.platth...@rhein-ruhr-informatik.de <mailto:thomas.platth...@realcore-it.de>
Web http://www.rhein-ruhr-informatik.de <http://www.realcore-it.de/>

Rhein-Ruhr-Informatik GmbH
Amtsgericht Essen: HRB 24657
Sitz der Gesellschaft: Essen
Geschäftsführer: Michael Heim


Reply via email to