Re: Cannot load pre existing PDF to access fields

Roberto Nibali Wed, 26 Aug 2015 02:16:45 -0700

Hi

On Wed, Aug 26, 2015 at 9:27 AM, Maruan Sahyoun <[email protected]>
wrote:


> Hi,
>
> > Am 26.08.2015 um 06:00 schrieb Tolen Miller <[email protected]>:
> >
> > I uploaded my PDF again, if someone wants to see if they can get all of
> the
> > fields to return: http://1drv.ms/1PRKZsI
> >
> > After looking at the sample provided by Maruan, I noticed that I was not
> > passing in a File object, when calling the PDDocument.load() method.
> Doing
> > so, I now get the same result from Maruan's code (in eclipse).
> >
> > Now I am unsure how to get *all* of the fields from the PDAcroForm.  I am
> > trying to get a collection of the fields, so I can loop through them.
> When
> > I add this code:
> >
> > List<PDField> pdfFields = form.getFields();
> > for (PDField field : pdfFields) {
> > System.out.println("PDF Field Full Name: ".concat(field
> > .getFullyQualifiedName()));
> > }
> >
>
> as there is only one 'root' field you have to get it's kids and process
> the field tree down. Take a look at
> org.apache.pdfbox.examples.fdf.PrintFields of how to do that.
>
>
Having spent the last two months intensively with form fields, here is my
current code to dump the fields:

private void executeDumpFields(String srcDocName) throws IOException {
    PDDocument srcDoc = null;
    try {
        srcDoc = PDDocument.load(new File(srcDocName));
        
srcDoc.getDocumentCatalog().getAcroForm().getFields().forEach(this::dumpField);
        srcDoc.close();
    } catch (Exception e) {
        logerr(e.getMessage());
    } finally {
        if (srcDoc != null) {
            srcDoc.close();
        }
    }
}

private void dumpField(PDField srcField) {
    if (srcField instanceof PDNonTerminalField) {
        ((PDNonTerminalField) srcField).getChildren().forEach(this::dumpField);
    } else if (!(srcField instanceof PDSignatureField)) {
        System.out.printf("fqName=%s type=%s%n",
srcField.getFullyQualifiedName(),
srcField.getClass().getSimpleName());
    }
}

Maybe you can use some of it. Just call the executeDumpFields(...) with the
appropriate PDF name as a string and go from there. Not understanding the
PDF standard and how the dictionary trees are built up inside PDF, I had a
hard time initially understanding why I need to kind of recursively to
through the PDField entries.

Cheers
Roberto

Re: Cannot load pre existing PDF to access fields

Reply via email to