Yeah it's weird, the effect in image2.pdf is with 17, 19 and 21. These
elements are missing in the /K hierarchy. I don't know enough to decide
whether it is a bug or not. I do have some understanding of the
structure tree stuff, but it's not perfect.
The specification has this: "The hierarchical relationship among
structure elements shall be represented entirely by the K entries of the
structure element dictionaries, not by nesting of the associated content
items."
I don't understand the end of the sentence ("not by..."), but I'd say
that the word "entirely" means your elements are missing.
What I don't understand is why PAC doesn't complain.
It's definitively not a PDFBox bug. PDFBox just shows what is. If you
suspect that the parser is broken, open the file with a different tool,
e.g. RUPS.
I'm have written a tool to detect this problem, I wonder if it occurs
with our test set.
Tilman
Am 11.09.2025 um 18:40 schrieb Mark Gibson:
Hi
We have some PDFs that are directly exported from Excel. They export
as accessible – tagged pdfs with structure tree.
Within the structure tree are elements of type “Artifact”, often used
for non-content aspects like background colors, etc.
When PDFBox (both v2 and v3) reads these PDFs (visible using the
PDFBox Debugger as seen in attached png, as well as just straight up
in code), there seems to be a structure tree discrepancy with some
parent-child relationships. The Artifact element (found in the
structure tree) has a pointer back to its parent. That parent has a
list of children. I’d expect that list of children to include the
Artifact. However, artifacts are never in their parent’s list of
children.
I’m trying to find out if this is expected and part of the PDF spec,
or a bug in PDFBox. This is currently causing issues for us in FOP
when we’re rendering accessible PDF outputs – when importing these PDF
image files, the import fails and never show up in final PDF output.
Ultimately, I’m trying to understand if the fix should be in PDFBox or
FOP.
I’ve attached two example PDFs, along with an image of the structure
tree of one of them highlighting the issue.
Many thanks
Mark
---------------------------------------------------------------------
To unsubscribe, e-mail:[email protected]
For additional commands, e-mail:[email protected]