Hi Tim,
as Simon is currently not available, I took over the handling of this on
the XWiki side.
On 2025/09/08 14:15:41 Tim Allison wrote:
Simon,
I'm sorry for my delay. I'm hesitant to share the triggering PDF
even offline.
I just added unit tests that confirm the fix for StAX processing:
https://github.com/apache/tika/pull/2318 . Will that be of any use to
you? The stax tests failed before the fix.
We've managed to produce a triggering PDF ourselves that exposes both
URL and file contents when extracting its text with Tika, so no need to
share anything. What we found with this PDF and also with the linked
test (thank you very much!) is that the vulnerability doesn't reproduce
with Woodstox as Stax XML API implementation. This is because Woodstox
actually uses the IGNORING_STAX_ENTITY_RESOLVER, it supports the String
return type and also wouldn't ignore it even if the return type wasn't
supported as long as the return value isn't null. The corresponding code
in Woodstox is
https://github.com/FasterXML/woodstox/blob/bfde796d30f074e51960cc681e8ab478bcbbedd3/src/main/java/com/ctc/wstx/io/DefaultInputResolver.java#L150-L158
So for now, based on this analysis, we assume that XWiki (which uses
Woodstox) isn't affected by CVE-2025-54988.
Unfortunately, we also noticed that the fix in Tika breaks parsing PDFs
with XFA with Woodstox as Woodstox doesn't support the
XMLConstants.ACCESS_EXTERNAL_DTD property - see also
https://github.com/FasterXML/woodstox/issues/162.
This can be reproduced with the mentioned unit tests, they fail with
java.lang.IllegalArgumentException: Unrecognized property
'http://javax.xml.XMLConstants/property/accessExternalDTD'
when Woodstox is added as a dependency.
Would it be possible to fix this, e.g., by catching this exception as it
is the case for all other properties? From what I understand from
https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html#xmlinputfactory-a-stax-parser,
XMLInputFactory.SUPPORT_DTD should be sufficient as protection, so I
think Tika shouldn't fail if XMLConstants.ACCESS_EXTERNAL_DTD isn't
supported - I would have rather expected it to fail if setting
XMLInputFactory.SUPPORT_DTD didn't work.
Thank you very much!
Michael