Hi, > Am 19.08.2015 um 21:16 schrieb Roberto Nibali <[email protected]>: > > Hi Tilman > > Thanks for your reply ... I did not really succeed. We'll probably end up > looking at how the PDFDebugger code does it ;). > > On Tue, Aug 18, 2015 at 9:08 PM, Tilman Hausherr <[email protected] > <mailto:[email protected]>> wrote: > Am 18.08.2015 um 20:50 schrieb Roberto Nibali: >> Hi >> >> I'd like to print out the corresponding object id given a specific form >> field. How would I do that with PDFBox programmatically? >> >> Let's for the sake of the argument, assume that the form field is >> represented by the following obj: >> >> obj 218 0 >> << >> /DA <2B94B0298F2FD7F81F32C6E22043> >> /F 4 >> /FT /Tx >> /Ff 4194304 >> /MK >> /P 28 0 R >> /Parent 46 0 R >> /Rect [159.781 764.53 347.142 777.195] >> /Subtype /Widget >> /T <5EB6B730886188AB3D3194B9654C18094C> >> /Type /Annot >> /V <45BBBA249C618BBD3974A4BE61501E57181D> >> /AP 666 0 R >> >> >> >> If I am going over all PDField entries of a PDF, how would I get to the >> underlying obj number (in the above case 218) from a PDField object? > > I haven't tried this myself, but I think you could "synchronise" the > getChildren() results with the getCOSObject().getItem(COSName.KIDS) array, > i.e. sort out which indirect type is which item returned from getChildren(). > The Kids COSArray has indirect objects (= COSObject type), as seen here: > > > > COSObject.getObject() returns the dereferenced object. > > The reason I asked about this is that while migrating some documents, we > found out that the originating PDFs not only have textual changes in the PDF > (mostly legal aspect changes in the fix text); the client in certain cases > modified the PDFs by adding borders or other graphical elements inside. Those > obviously do not show up in the template PDF. > > My somewhat (maybe stupid) idea was to simply print out the obj id or even > the whole object and subsequently insert it into the template for the final > PDF during the form field migration, on top of updating all references to the > new obj id. > > At least for simple geometric shapes, like rectangles, this should be > feasible, no? Anyway, after constantly getting "null" from the > getCOSObject().getItem(COSName.KIDS) and nothing out of getChildren() from a > given PDField, I kind of gave up. > > Imagine you had the following code, and wanted to additionally dump out the > underlying object id and the referencing ids of the PDField: > @Test > private void excuteDumpFields() throws IOException { > PDDocument srcDoc = null; > try { > srcDoc = PDDocument.load(new File(srcDocName)); > PDAcroForm acroForm = srcDoc.getDocumentCatalog().getAcroForm(); > List<PDField> fields = acroForm.getFields(); > for (PDField field : fields) { > dumpField(srcDoc, field); > } > srcDoc.close(); > } catch (Exception e) { > logerr(e.getMessage()); > } finally { > if (srcDoc != null) { > srcDoc.close(); > } > } > } > > private void dumpField(PDDocument srcDoc, PDField srcField) throws > IOException { > if (srcField instanceof PDNonTerminalField) { > for (PDField child : ((PDNonTerminalField) srcField).getChildren()) { > dumpField(srcDoc, child); > } > } else if (!(srcField instanceof PDSignatureField)) { > String fqName = srcField.getFullyQualifiedName(); > String fTypes[] = srcField.getClass().getName().split("\\."); > System.out.printf("fqName=%s type=%s%n", fqName, > fTypes[fTypes.length-1]); > } > } > It has become customary to me to dump the objects using the pdf-parser > (http://blog.didierstevens.com/programs/pdf-tools/ > <http://blog.didierstevens.com/programs/pdf-tools/>) as follows to futher > investigate issues (excerpt showing the dump of object 228): > > $ python pdf-parser.py -o 228 ../../ccmig2.pdf > > obj 228 0 > Type: /Annot > Referencing: 685 0 R, 28 0 R, 46 0 R, 686 0 R > > << > /AA > << > /K 685 0 R > >> > /DA <92F8913CB200CF3C13A363C2D20D> > /F 4 > /FT /Tx > /Ff 12582912 > /MK > /MaxLen 1 > /P 28 0 R > /Parent 46 0 R > /Q 1 > /Rect [454.437 769.504 465.482 782.169] > /Subtype /Widget > /T <8C8A> > /Type /Annot > /V () > /AP 686 0 R > >> > > And to get the objects referencing object 228: > > $ python pdf-parser.py -r 228 ../../ccmig2.pdf > > obj 28 0 > Type: /Page > Referencing: 101 0 R, 217 0 R, 218 0 R, 219 0 R, 220 0 R, 221 0 R, 222 0 R, > 223 0 R, 224 0 R, 225 0 R, 226 0 R, 227 0 R, 228 0 R, 229 0 R, 230 0 R, 231 0 > R, 232 0 R, 61 0 R, 60 0 R, 62 0 R, 63 0 R, 64 0 R, 65 0 R, 66 0 R, 67 0 R, > 69 0 R, 68 0 R, 70 0 R, 71 0 R, 72 0 R, 73 0 R, 74 0 R, 75 0 R, 76 0 R, 77 0 > R, 78 0 R, 79 0 R, 80 0 R, 81 0 R, 82 0 R, 83 0 R, 84 0 R, 86 0 R, 87 0 R, 88 > 0 R, 89 0 R, 90 0 R, 91 0 R, 92 0 R, 93 0 R, 94 0 R, 95 0 R, 96 0 R, 97 0 R, > 85 0 R, 233 0 R, 234 0 R, 235 0 R, 236 0 R, 237 0 R, 238 0 R, 239 0 R, 22 0 > R, 240 0 R, 241 0 R, 242 0 R, 243 0 R, 244 0 R, 245 0 R, 246 0 R, 247 0 R, > 103 0 R, 248 0 R, 6 0 R, 205 0 R, 206 0 R, 207 0 R, 208 0 R, 209 0 R, 210 0 > R, 211 0 R, 213 0 R, 212 0 R > > << > /Annots '[101 0 R 217 0 R 218 0 R 219 0 R 220 0 R 221 0 R 222 0 R 223 0 R > 224 0 R 225 0 R\n226 0 R 227 0 R 228 0 R 229 0 R 230 0 R 231 0 R 232 0 R 61 0 > R 60 0 R 62 0 R\n63 0 R 64 0 R 65 0 R 66 0 R 67 0 R 69 0 R 68 0 R 70 0 R 71 0 > R 72 0 R\n73 0 R 74 0 R 75 0 R 76 0 R 77 0 R 78 0 R 79 0 R 80 0 R 81 0 R 82 0 > R\n83 0 R 84 0 R 86 0 R 87 0 R 88 0 R 89 0 R 90 0 R 91 0 R 92 0 R 93 0 R\n94 > 0 R 95 0 R 96 0 R 97 0 R 85 0 R 233 0 R 234 0 R 235 0 R 236 0 R 237 0 R\n238 > 0 R 239 0 R 22 0 R 240 0 R 241 0 R 242 0 R 243 0 R 244 0 R 245 0 R 246 0 > R\n247 0 R 103 0 R]' > /BleedBox [0.0 0.0 595.276 841.89] > /Contents 248 0 R > /CropBox [0.0 0.0 595.276 841.89] > /MediaBox [0.0 0.0 595.276 841.89] > /Parent 6 0 R > /Resources > << > /ExtGState > << > /GS0 205 0 R > /GS1 206 0 R > /GS2 207 0 R > /GS3 208 0 R > >> > /Font > << > /C2_0 209 0 R > /C2_1 210 0 R > /TT0 211 0 R > /TT1 213 0 R > /TT2 212 0 R > >> > /ProcSet [/PDF /Text] > >> > /Rotate 0 > /Tabs /W > /TrimBox [0.0 0.0 595.276 841.89] > /Type /Page > >> > > > obj 46 0 > Type: > Referencing: 218 0 R, 230 0 R, 231 0 R, 232 0 R, 219 0 R, 217 0 R, 220 0 R, > 221 0 R, 222 0 R, 223 0 R, 224 0 R, 225 0 R, 226 0 R, 227 0 R, 228 0 R, 229 0 > R, 17 0 R > > << > /Kids '[218 0 R 230 0 R 231 0 R 232 0 R 219 0 R 217 0 R 220 0 R 221 0 R > 222 0 R 223 0 R\n224 0 R 225 0 R 226 0 R 227 0 R 228 0 R 229 0 R]' > /Parent 17 0 R > /T <32AB37> > >> > > It would be tremendous if I could get at least the proper object id out of > the PDFields using PDFBox.
a PDField is uniquely identified by it's full name - which can als be used to find it within the template. Now if someone added a border in the source document field which you would like to add to the template document field this is part of the widget definition for the field e.g. the /MK entry. There are also some defaults used by Acrobat e.g. when a border color is defined there will be a small border around the field even if there is no border width defined. If I understood your use case correctly knowing the object id of the field wouldn't help in this case. BR Maruan > > Take care > Roberto > >

