Hi Tim,

as Simon is currently not available, I took over the handling of this on the XWiki side.

On 2025/09/08 14:15:41 Tim Allison wrote:
Simon,
  I'm sorry for my delay. I'm hesitant to share the triggering PDF
even offline.

  I just added unit tests that confirm the fix for StAX processing:
https://github.com/apache/tika/pull/2318 . Will that be of any use to
you? The stax tests failed before the fix.

We've managed to produce a triggering PDF ourselves that exposes both URL and file contents when extracting its text with Tika, so no need to share anything. What we found with this PDF and also with the linked test (thank you very much!) is that the vulnerability doesn't reproduce with Woodstox as Stax XML API implementation. This is because Woodstox actually uses the IGNORING_STAX_ENTITY_RESOLVER, it supports the String return type and also wouldn't ignore it even if the return type wasn't supported as long as the return value isn't null. The corresponding code in Woodstox is
https://github.com/FasterXML/woodstox/blob/bfde796d30f074e51960cc681e8ab478bcbbedd3/src/main/java/com/ctc/wstx/io/DefaultInputResolver.java#L150-L158

So for now, based on this analysis, we assume that XWiki (which uses Woodstox) isn't affected by CVE-2025-54988.

Unfortunately, we also noticed that the fix in Tika breaks parsing PDFs with XFA with Woodstox as Woodstox doesn't support the XMLConstants.ACCESS_EXTERNAL_DTD property - see also https://github.com/FasterXML/woodstox/issues/162.

This can be reproduced with the mentioned unit tests, they fail with

java.lang.IllegalArgumentException: Unrecognized property 'http://javax.xml.XMLConstants/property/accessExternalDTD'

when Woodstox is added as a dependency.

Would it be possible to fix this, e.g., by catching this exception as it is the case for all other properties? From what I understand from https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html#xmlinputfactory-a-stax-parser, XMLInputFactory.SUPPORT_DTD should be sufficient as protection, so I think Tika shouldn't fail if XMLConstants.ACCESS_EXTERNAL_DTD isn't supported - I would have rather expected it to fail if setting XMLInputFactory.SUPPORT_DTD didn't work.

Thank you very much!

Michael

Reply via email to