CachedSQLentity processor is using unbounded hashmap 
-----------------------------------------------------

                 Key: SOLR-1867
                 URL: https://issues.apache.org/jira/browse/SOLR-1867
             Project: Solr
          Issue Type: Bug
          Components: contrib - DataImportHandler
    Affects Versions: 1.4
            Reporter: barani


I am using cachedSqlEntityprocessor in DIH to index the data. Please find a 
sample dataconfig structure, 

<entity x query="select * from x"> ---> object 
<entity y query="select * from y" processor="cachedSqlEntityprocessor" 
cachekey=y.id cachevalue=x.id> --> object properties 

For each and every object I would be retrieveing corresponding object 
properties (in my subqueries). 

I get in to OOM very often and I think thats a trade off if I use 
cachedSqlEntityprocessor. 

My assumption is that when I use cachedSqlEntityprocessor the indexing happens 
as follows, 

First entity x will get executed and the entire table gets stored in cache 
next entity y gets executed and entire table gets stored in cache 
Finally the compasion heppens through hash map . 

So always I need to have the memory allocated to SOLR JVM more than or equal to 
the data present in tables.

One more issue is that even after SOLR completes indexing, the memory used 
previously is not getting released. I could still see the JVM consuming 1.5 GB 
after the indexing completes. I tried to use Java hotspot options but didnt see 
any differences.. GC is not getting invoked even after a long time when using 
CachedSQLentity processor

Main issue seem to be the fact that  the CachedSQLentity processor cache is an 
unbounded HashMap, with no option to bound it. 

Reference: 
http://n3.nabble.com/Need-info-on-CachedSQLentity-processor-tt698418.html#a698418



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to