Re: How to flatedecode and find all acroform fields in a compressed PDF

Tilman Hausherr Tue, 19 May 2015 14:58:07 -0700

Hi,

The image doesn't appear in the mailing list.

This is all very confusing... /acroform is in the document catalog. Idon't see how the page content stream is related to it. The best is thatyou either go through the source code, or read the spec and then look atthe pdf.

To find out what's going on, you'd have to start from that /acroformentry and then compare the two files.

It is really difficult to help you without the files. The cause could bea bug in pdfbox, or a malformed pdf...


Some more ideas:
- use loadNonSeq(file, null) instead of load(file)

- try the unreleased 2.0 version, that one has some improvements in theacroform stuff. Note that the API is different.

https://pdfbox.apache.org/download.cgi#scm
https://pdfbox.apache.org/2.0/getting-started.html

If you still need help, one possibility would be 1) post the smallestpossible code that fails, and 2) post a small part of the raw PDF, i.e.the objects relevant to the field in your code.



Tilman


Am 19.05.2015 um 23:03 schrieb Balaji Venkatamohan:

Moreover, for every page of the compressed PDF (there are 3 pages), Itried getting the COSStream for each of the page :

PDPage firstPage=(PDPage)document.getDocumentCatalog().getAllPages().get(0);

            pdStream=firstPage.getContents();
            COSStream stream=pdStream.getStream();

In the above code snippet, the object stream, when analyzed in debugmode, has the following:



The line from the compressed PDF as opened with Notepad++ is :

<</Filter/FlateDecode/Length 5675>>stream

From this point on, using the COSStream object for every page, how canI decompress and find out the acroform fields given that theunFilteredStream object is null for COSStream?

On Tue, May 19, 2015 at 1:38 PM, Balaji Venkatamohan<[email protected] <mailto:[email protected]>> wrote:


    Thank you for your response Tilman.

    I had previously tried using the WriteDecodedDoc for my compressed

PDF and I tried to get the number of acro form fields present inthe output file generated by WriteDecodedDoc. The API still could

    not find the acro form fields in the generated decompressed file.
     Also the decompressed file generated is 75 KB which is far less
    than the original decompressed file which I have (1.6 MB) though I
    could edit the acro form fields using acrobat reader.

    Thanks,
    Balaji



    On Tue, May 19, 2015 at 1:18 PM, Tilman Hausherr
    <[email protected] <mailto:[email protected]>> wrote:

        Am 19.05.2015 um 21:35 schrieb Balaji Venkatamohan:

            My question is: how do I flatedecode a PDF so that I can
            find all the
            acroform fields within it. ANy help or pointers would be
            highly appreciated.


        You could try the WriteDecodedDoc option of the command line app
        https://pdfbox.apache.org/1.8/commandline.html#writeDecodeDoc

        Maybe you can have further ideas by comparing the two files
        with NOTEPAD++.... however the two files might have their
        objects in different order.

        Tilman



        ---------------------------------------------------------------------
        To unsubscribe, e-mail: [email protected]
        <mailto:[email protected]>
        For additional commands, e-mail: [email protected]
        <mailto:[email protected]>

Re: How to flatedecode and find all acroform fields in a compressed PDF

Reply via email to