CachedSQLentity processor is using unbounded hashmap
-----------------------------------------------------
Key: SOLR-1867
URL: https://issues.apache.org/jira/browse/SOLR-1867
Project: Solr
Issue Type: Bug
Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: barani
I am using cachedSqlEntityprocessor in DIH to index the data. Please find a
sample dataconfig structure,
<entity x query="select * from x"> ---> object
<entity y query="select * from y" processor="cachedSqlEntityprocessor"
cachekey=y.id cachevalue=x.id> --> object properties
For each and every object I would be retrieveing corresponding object
properties (in my subqueries).
I get in to OOM very often and I think thats a trade off if I use
cachedSqlEntityprocessor.
My assumption is that when I use cachedSqlEntityprocessor the indexing happens
as follows,
First entity x will get executed and the entire table gets stored in cache
next entity y gets executed and entire table gets stored in cache
Finally the compasion heppens through hash map .
So always I need to have the memory allocated to SOLR JVM more than or equal to
the data present in tables.
One more issue is that even after SOLR completes indexing, the memory used
previously is not getting released. I could still see the JVM consuming 1.5 GB
after the indexing completes. I tried to use Java hotspot options but didnt see
any differences.. GC is not getting invoked even after a long time when using
CachedSQLentity processor
Main issue seem to be the fact that the CachedSQLentity processor cache is an
unbounded HashMap, with no option to bound it.
Reference:
http://n3.nabble.com/Need-info-on-CachedSQLentity-processor-tt698418.html#a698418
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.