> setExtractAcroFormContent(false)
Lol... that's what I was looking for but grepping for xfa didn't find
that... need more coffee.

Yes, of course, that would solve it. You can configure that via tika-config.xml.

On Fri, Aug 22, 2025 at 9:24 AM Tilman Hausherr <[email protected]> wrote:
>
> Am 22.08.2025 um 14:38 schrieb Tim Allison:
> > Unfortunately, there's no way via configuration to tell Tika to avoid
> > parsing XFA.
>
> I've been trying to research this but somehow I messed up my IDE while
> working on TIKA-4470 so I can't properly test right now. I was wondering
> whether disabling acroform (setExtractAcroFormContent(false)) would work
> (although we'd lose the classic form content as well), or if we could
> exclude the XMP parser. (There are two occurences of XFA usage in
> AbstractPDF2XHTML.java)
>
> Another solution would be to check PDF files with PDFBox (easy), and
> also check for attachments (less easy because there are two types of
> attachments).
>
> Tilman
>

Reply via email to