A couple of things: 1) can you give some more details about your setup ? Like whether its cloud or single instance . How many nodes if its cloud. The hardware - memory per machine , JVM options. Etc
2) any specific reason for using 4.0 beta? The latest version is 4.3. I used 4.0 for a few weeks and there were a lot if bugs related to memory and communication between nodes ( zookeeper) 3) if you haven't seen it already , please go through this wiki page . It's an excellent starting point for troubleshooting memory n indexing issues. Specially section 3 to 7 http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations -- Shreejay On Sunday, June 2, 2013 at 7:16, Yoni Amir wrote: > Hello, > I am receiving OutOfMemoryError during indexing, and after investigating the > heap dump, I am still missing some information, and I thought this might be a > good place for help. > > I am using Solr 4.0 beta, and I have 5 threads that send update requests to > Solr. Each request is a bulk of 100 SolrInputDocuments (using solrj), and my > goal is to index around 2.5 million documents. > Solr is configured to do a hard-commit every 10 seconds, so initially I > thought that it can only accumulate in memory 10 seconds worth of updates, > but that's not the case. I can see in a profiler how it accumulates memory > over time, even with 4 to 6 GB of memory. It is also configured to optimize > with mergeFactor=10. > > At first I thought that optimization is a blocking, synchronous operation. It > is, in the sense that the index can't be updated during optimization. > However, it is not synchronous, in the sense that the update request coming > from my code is not blocked - Solr just returns an OK response, even while > the index is optimizing. > This indicates that Solr has an internal queue of inbound requests, and that > the OK response just means that it is in the queue. I get confirmation for > this from a friend who is a Solr expert (or so I hope). > > My main question is: how can I put a bound on this internal queue, and make > update requests synchronous in case the queue is full? Put it another way, I > need to know if Solr is really ready to receive more requests, so I don't > overload it and cause OOME. > > I performed several tests, with slow and fast disks, and on the really fasts > disk the problem didn't occur. However, I can't demand such fast disk from > all the clients, and also even with a fast disk the problem will occur > eventually when I try to index 10 million documents. > I also tried to perform indexing with optimization disabled, but it didn't > help. > > Thanks, > Yoni > > Confidentiality: This communication and any attachments are intended for the > above-named persons only and may be confidential and/or legally privileged. > Any opinions expressed in this communication are not necessarily those of > NICE Actimize. If this communication has come to you in error you must take > no action based on it, nor must you copy or show it to anyone; please > delete/destroy and inform the sender by e-mail immediately. > Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. > Viruses: Although we have taken steps toward ensuring that this e-mail and > attachments are free from any virus, we advise that in keeping with good > computing practice the recipient should ensure they are actually virus free. > >