Memory leaks in DIH ------------------- Key: SOLR-1042 URL: https://issues.apache.org/jira/browse/SOLR-1042 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3 Reporter: Ryuuichi Kumai Attachments: SOLR-1042.patch
If delta-import is executed many times, the heap utilization grows up and finally OutOfMemoryError occurs. When delta-import is executed with SqlEntityProcessor, the instances of TemplateString cached in VariableResolverImpl#TEMPLATE_STRING#cache. If the deltaQuery contains variable like `last_index_time', the cached values never used increases. Similarly, I guess that the cache increases when fetching each modified row with primary key. I think these queries should not be cached. I came up with two solutions: 1) Not to cache queries to get modified rows. 2) Make VariableResolverImpl#TEMPLATE_STRING non-static. Or clear cache on finishing delta-import. I think that #1 is better for performance than #2, but #2 is easier to solve the problem. I made a patch in #2 way, and then tested two solr applications with `-XX:+PrintClassHistgram' option. The result after importing several million rows from a MySQL database is as follows: * original solr-1.3: num #instances #bytes class name ---------------------------------------------- ... 6: 2983024 119320960 org.apache.solr.handler.dataimport.TemplateString ... * patched solr-1.3: num #instances #bytes class name ---------------------------------------------- ... 748: 3 120 org.apache.solr.handler.dataimport.TemplateString ... Though it is version 1.3 that I tested, perhaps current nightly version has same problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.