Hi,

my final verdict is the upgrade to Tika 1.17. If I downgrade the libraries just 
for tika back to 1.16 and keep the rest of SOLR 7.4.0 the heap usage after 
about 85 % of the index process and manual trigger of the garbage collector is 
about 60-70 MB (That low!!!)

My problem now is that we have several setups that triggers this reliably but 
there is no simple test case that „fails“ if Tika 1.17 or 1.18 is used. I also 
do not know if the error is inside Tika or inside the glue code that makes Tika 
usable in SOLR.

Should I file an issue for this?

kind regards,

Thomas


> Am 02.08.2018 um 12:06 schrieb Thomas Scheffler 
> <thomas.scheff...@uni-jena.de>:
> 
> Hi,
> 
> we noticed a memory leak in a rather small setup. 40.000 metadata documents 
> with nearly as much files that have „literal.*“ fields with it. While 7.2.1 
> has brought some tika issues (due to a beta version) the real problems 
> started to appear with version 7.3.0 which are currently unresolved in 7.4.0. 
> Memory consumption is out-of-roof. Where previously 512MB heap was enough, 
> now 6G aren’t enough to index all files.
> I am now to a point where I can track this down to the libraries in 
> solr-7.4.0/contrib/extraction/lib/. If I replace them all by the libraries 
> shipped with 7.2.1 the problem disappears. As most files are PDF documents I 
> tried updating pdfbox to 2.0.11 and tika to 1.18 with no solution to the 
> problem. I will next try to downgrade these single libraries back to 2.0.6 
> and 1.16 to see if these are the source of the memory leak.
> 
> In the mean time I would like to know if anybody else experienced the same 
> problems?
> 
> kind regards,
> 
> Thomas


Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to