On Jul 15, 2012, at 2:45 PM, Nick Koton wrote: > I converted my program to use > the SolrServer::add(Collection<SolrInputDocument> docs) method with 100 > documents in each add batch. Unfortunately, the out of memory errors still > occur without client side commits.
This won't change much unfortunately - currently, each host has 10 add and 10 deletes buffered for it before it will flush. There are some recovery implications that have kept that buffer size low so far - but what it ends up meaning is that when you stream docs, every 10 docs is sent off on a thread. Generally, you might be able to keep up with this - but the commit cost appears to perhaps cause a small resource drop that backs things up a bit - and some of those threads take a little longer to finish while new threads fire off to keep servicing the constantly coming new documents. What appears will happen is large momentary spikes in the number of threads. Each thread needs a bit of space on the heap, and it would seem with a high enough spike you could get an OOM. In my testing, I have not triggered that yet, but I have seen large thread count spikes. Raising the add doc buffer to 100 docs makes those thread bursts much, much less severe. I can't remember all of the implications of that buffer size though - need to talk to Yonik about it. We could limit the number of threads for that executor, but I think that comes with some negatives as well. You could try lowering -Xss so that each thread uses less RAM (if possible) as a shorter term (possible) workaround. You could also use multiple threads with the std HttpSolrServer - it won't be quite as fast probably, but it can get close(ish). My guess is that your client commits help because a commit will cause a wait on all outstanding requests - so that the commit is in logical order - this probably is like releasing a pressure valve - the system has a chance to catch up and reclaim lots of threads. We will keep looking into what the best improvement is. - Mark Miller lucidimagination.com