Thanks, Tilman

I reviewed both images with PAC.  They are by no means “clean”.  But I don’t 
see any errors or warnings related to the areas of the PDF I know are causing 
the issue with the “Artifact” elements.

I’ve tried to link them on google drive.  Hope thay’re accessible:

https://drive.google.com/file/d/1CmRKObEDGxkBjm8wOsCoN0MuMaDFz0w1/view?usp=sharing,
https://drive.google.com/file/d/1XAz1650PYY16CPmUhoPRTDKgc_Q4HVpR/view?usp=sharing

I have started to try to debug the loading of the PDF, but it’s quite an 
overwhelming task when completely new to the codebase.

Let me know if there’s anything else I can do.  Happy to do as much as 
necessary, but I’m at a bit of a loss where to turn next.

Thanks
Mark
From: Tilman Hausherr <[email protected]>
Sent: 12 September 2025 04:25
To: [email protected]
Subject: Re: Issue with Tagged PDF with Artifact elements in the Structure Tree 
- artifact not in its parent's list of children

[EXTERNAL]
Hi,

You need to upload the PDFs to a sharehoster.
The best would be that you check your PDF with PAC: 
https://pac.pdf-accessibility.org/en/download

Tilman

Am 11.09.2025 um 18:40 schrieb Mark Gibson:
Hi

We have some PDFs that are directly exported from Excel.  They export as 
accessible – tagged pdfs with structure tree.

Within the structure tree are elements of type “Artifact”, often used for 
non-content aspects like background colors, etc.

When PDFBox (both v2 and v3) reads these PDFs (visible using the PDFBox 
Debugger as seen in attached png, as well as just straight up in code), there 
seems to be a structure tree discrepancy with some parent-child relationships.  
The Artifact element (found in the structure tree) has a pointer back to its 
parent.  That parent has a list of children.  I’d expect that list of children 
to include the Artifact.  However, artifacts are never in their parent’s list 
of children.

I’m trying to find out if this is expected and part of the PDF spec, or a bug 
in PDFBox.  This is currently causing issues for us in FOP when we’re rendering 
accessible PDF outputs – when importing these PDF image files, the import fails 
and never show up in final PDF output.  Ultimately, I’m trying to understand if 
the fix should be in PDFBox or FOP.

I’ve attached two example PDFs, along with an image of the structure tree of 
one of them highlighting the issue.

Many thanks
Mark



[cid:[email protected]]








---------------------------------------------------------------------

To unsubscribe, e-mail: 
[email protected]<mailto:[email protected]>

For additional commands, e-mail: 
[email protected]<mailto:[email protected]>


Reply via email to