On 2/4/2016 12:18 AM, Srinivas Kashyap wrote: > I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the > child entities in data-config.xml. When i try to do full import, i'm getting > OutOfMemory error(Java Heap Space). I increased the HEAP allocation to the > maximum extent possible. Is there a workaround to do initial data load > without running into this error? > > I found that 'batchSize=-1' parameter needs to be specified in the datasource > for MySql, is there a way to specify for others Databases as well?
Setting batchSize to -1 in the DIH config translates to a 'setFetchSize' on the JDBC object of Integer.MIN_VALUE. This is how to turn on result streaming in MySQL. The method for doing this with other JDBC implementations is likely to be different. The Microsoft driver for SQL Server uses a URL parameter, and newer versions of that particular driver have the streaming behavior as default. I have no idea how to do it for any other driver, you would need to ask the author of the driver. When you turn on caching (SortedMapBackedCache), you are asking Solr to put all of the data received into memory -- very similar to what happens if result streaming is not turned on. When the SQL result is very large, this can require a LOT of memory. In situations like that, you'll just have to remove the caching. One alternative to child entities is to do a query using JOIN in a single entity, so that all the data you need is returned by a single SQL query, where the heavy lifting is done by the database server instead of Solr. The MySQL database that serves as the information source for *my* Solr index is hundreds of gigabytes in size, so caching it is not possible for me. The batchSize=-1 option is the only way to get the import to work. Thanks, Shawn