On 5/21/2013 9:22 PM, Umesh Prasad wrote: > This is our own implementation of data source (canon name > com.flipkart.w3.solr.MultiSPCMSProductsDataSource) , which pulls the data > from out downstream service and it doesn't cache data in RAM. It fetches > the data in batches of 200 and iterates over it when DIH asks for it. I > will check the possibility of leak, but unlikely. > Can OOM issue be because during analysis, IndexWriter finds the > document to be too large to fit in 100 MB memory and can't flush to disk ? > Our analyzer chain doesn't make easy (specially with a field like) (does a > cross product of synonyms terms)
If your documents are really large (hundreds of KB, or a few MB), you might need a bigger ramBufferSizeMB value ... but if that were causing problems, I would expect it to show up during import, not at commit time. How much of your 32GB heap is in use during indexing? Would you be able to try with the heap at 31GB instead of 32GB? One of Java's default optimizations (UseCompressedOops) gets turned off with a heap size of 32GB because it doesn't work any more, and that might lead to strange things happening. Do you have the ability to try 4.3 instead of 4.2.1? Thanks, Shawn