Hi,

In using Tika Server, I've run into issues with large compressed files
causing OOM issues, which is resulting in reduced availability. Are there
any config flags available for limiting text extraction based on size? In
most cases, I would do this by checking the size prior to sending the file
to Tika, but with compressed files, I don't know the uncompressed size
before sending it to Tika.

So far I've attempted adding the following to my `tika-config.xml`, but I'm
not sure if this is a parameter that gets loaded in from the config and
into the parser. In my testing, I didn't see any effect. I'm also not sure
if it would help with what I am trying to do, so perhaps that's an issue.

<parser class="org.apache.tika.parser.pkg.CompressorParser">
<params>
<param name="memoryLimitInKb" type="int">100000</param>
</params>
</parser>

I'm currently running Tika Server Standard 2.3.0.

Thanks.

Reply via email to