Re: Issue with Tagged PDF with Artifact elements in the Structure Tree - artifact not in its parent's list of children

Mark Gibson Sat, 20 Sep 2025 09:15:08 -0700

Huge thanks Tilman

I’m still learning this stuff.  But I did debug the loading of the image PDF, 
paying close attention to the structure tree.  The more I dug in, the more I 
started to think this was a PDF output issue from Excel.  But the spec is 
large, complex, and obscure in places.  So I was still uncertain.


The Cross Reference table view in the PDFDebugger fairly plainly shows 
StructTreeRoot, ParentTree (in StructTreeRoot) and page content.  And it’s 
plain there’s discrepancies.  Allowable by the spec?  No idea.

Mark 

> On 12 Sep 2025, at 20:43, Tilman Hausherr <[email protected]> wrote:
> 
> [EXTERNAL]
> 
> https://issues.apache.org/jira/browse/PDFBOX-6067
> 
> this code checks for weirdness in a specific file, you need the code
> from PDFMergerUtilityTest:
> 
> 
>    void testSpecificFile() throws IOException
>    {
>        File file = new File("XXXXX/image2.pdf");
>        try (PDDocument doc = Loader.loadPDF(file))
>        {
>            PDStructureTreeRoot structureTreeRoot =
> doc.getDocumentCatalog().getStructureTreeRoot();
>            if (structureTreeRoot != null &&
> structureTreeRoot.getParentTree() != null)
>            {
>                checkWithNumberTree(doc);
>                checkForPageOrphans(doc);
>                checkForIDTreeOrphans(doc.getPages(), structureTreeRoot);
>            }
>        }
>        checkStructTreeRootCount(file);
>    }
> 
> 
> Your file fails with
> 
> Element 0:3 from /ParentTree missing in /K  ==> expected: <true> but
> was: <false>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

Re: Issue with Tagged PDF with Artifact elements in the Structure Tree - artifact not in its parent's list of children

Reply via email to