Hello, I am attempting to process some .msg files that have a considerable amount of nested content that throw a "Suspected zip bomb: 100 levels of XML element nesting" when using the out of the box HtmlParser + EmbeddedContentHandler/BodyContentHandler on the body of the emails. Sadly I am not able to provide these emails for review.
I have been trying to figure out how to modify the maximum allowed depth for the content handlers that HtmlParser uses, but I'm a bit lost in the weeds trying to figure out how the ContentHandlerDecorator pattern works and how the SecureContentHandler can be loaded with a different config. I looked over TIKA-2091, but this seems to be proprietary to a different project. All other googling turned up Solr specific StackOverflow threads for this zip bomb error. Any chance someone could point me to documentation on what all changes need to be made to increase this limit (or, if I can modify this in tika-config.xml instead)? -- This message and any attachments constitute electronic communication within the meaning of the Electronic Communications Privacy Act, 18 U.S.C. ยงยง 2510-2521, is intended for the recipient(s) only and may contain confidential and/or privileged information. If you are not the intended recipient, do not read, copy, distribute or use this information. If received in error, notify sender immediately by reply e-mail and delete this message.