Memory leaks in DIH
-------------------

                 Key: SOLR-1042
                 URL: https://issues.apache.org/jira/browse/SOLR-1042
             Project: Solr
          Issue Type: Bug
          Components: contrib - DataImportHandler
    Affects Versions: 1.3
            Reporter: Ryuuichi Kumai
         Attachments: SOLR-1042.patch

If delta-import is executed many times, the heap utilization grows up and 
finally OutOfMemoryError occurs.

When delta-import is executed with SqlEntityProcessor, the instances of 
TemplateString cached in VariableResolverImpl#TEMPLATE_STRING#cache.
If the deltaQuery contains variable like `last_index_time', the cached values 
never used increases.
Similarly, I guess that the cache increases when fetching each modified row 
with primary key.
I think these queries should not be cached. 

I came up with two solutions:

 1) Not to cache queries to get modified rows.
 2) Make VariableResolverImpl#TEMPLATE_STRING non-static. Or clear cache on 
finishing delta-import.

I think that #1 is better for performance than #2, but #2 is easier to solve 
the problem.

I made a patch in #2 way, and then tested two solr applications with 
`-XX:+PrintClassHistgram' option.
The result after importing several million rows from a MySQL database is as 
follows:

 * original solr-1.3:
 num     #instances         #bytes  class name
----------------------------------------------
...
  6:       2983024      119320960  
org.apache.solr.handler.dataimport.TemplateString
...

 * patched solr-1.3:
 num     #instances         #bytes  class name
----------------------------------------------
...
 748:             3            120  
org.apache.solr.handler.dataimport.TemplateString
...

Though it is version 1.3 that I tested, perhaps current nightly version has 
same problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to