Huge thanks Tilman I’m still learning this stuff. But I did debug the loading of the image PDF, paying close attention to the structure tree. The more I dug in, the more I started to think this was a PDF output issue from Excel. But the spec is large, complex, and obscure in places. So I was still uncertain.
The Cross Reference table view in the PDFDebugger fairly plainly shows StructTreeRoot, ParentTree (in StructTreeRoot) and page content. And it’s plain there’s discrepancies. Allowable by the spec? No idea. Mark > On 12 Sep 2025, at 20:43, Tilman Hausherr <[email protected]> wrote: > > [EXTERNAL] > > https://issues.apache.org/jira/browse/PDFBOX-6067 > > this code checks for weirdness in a specific file, you need the code > from PDFMergerUtilityTest: > > > void testSpecificFile() throws IOException > { > File file = new File("XXXXX/image2.pdf"); > try (PDDocument doc = Loader.loadPDF(file)) > { > PDStructureTreeRoot structureTreeRoot = > doc.getDocumentCatalog().getStructureTreeRoot(); > if (structureTreeRoot != null && > structureTreeRoot.getParentTree() != null) > { > checkWithNumberTree(doc); > checkForPageOrphans(doc); > checkForIDTreeOrphans(doc.getPages(), structureTreeRoot); > } > } > checkStructTreeRootCount(file); > } > > > Your file fails with > > Element 0:3 from /ParentTree missing in /K ==> expected: <true> but > was: <false> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] >

