On 2/4/2016 12:18 AM, Srinivas Kashyap wrote:
> I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the 
> child entities in data-config.xml. When i try to do full import, i'm getting 
> OutOfMemory error(Java Heap Space). I increased the HEAP allocation to the 
> maximum extent possible. Is there a workaround to do initial data load 
> without running into this error?
>
> I found that 'batchSize=-1' parameter needs to be specified in the datasource 
> for MySql, is there a way to specify for others Databases as well?

Setting batchSize to -1 in the DIH config translates to a 'setFetchSize'
on the JDBC object of Integer.MIN_VALUE.  This is how to turn on result
streaming in MySQL.

The method for doing this with other JDBC implementations is likely to
be different.  The Microsoft driver for SQL Server uses a URL parameter,
and newer versions of that particular driver have the streaming behavior
as default.  I have no idea how to do it for any other driver, you would
need to ask the author of the driver.

When you turn on caching (SortedMapBackedCache), you are asking Solr to
put all of the data received into memory -- very similar to what happens
if result streaming is not turned on.  When the SQL result is very
large, this can require a LOT of memory.  In situations like that,
you'll just have to remove the caching.  One alternative to child
entities is to do a query using JOIN in a single entity, so that all the
data you need is returned by a single SQL query, where the heavy lifting
is done by the database server instead of Solr.

The MySQL database that serves as the information source for *my* Solr
index is hundreds of gigabytes in size, so caching it is not possible
for me.  The batchSize=-1 option is the only way to get the import to work.

Thanks,
Shawn

Reply via email to