Siddharth,

The settings you have in your solrconfig for ramBufferSizeMB and
maxBufferedDocs control how much memory may be used during indexing besides
any overhead with the documents being "in-flight" at a given moment
(deserialized into memory but not yet handed to lucene).  There are
streaming versions of the client/server that help with that as well by
trying to process them as they arrive.

The patch SOLR-1155 does not add more memory use, but rather lets the
threads proceed through to Lucene without blocking within Solr as often.  So
instead of a stuck thread holding the documents in memory they will be
moving threads doing the same.

So the buffer sizes mentioned above along with the amount of documents you
send at a time will push your memory footprint.  Send smaller batches (less
efficient) or stream; or make sure you have enough memory for the amount of
docs you send at a time.  

For indexing I slow my commits down if there is no need for the documents to
become available for query right away.  For pure indexing, a long autoCommit
time and large max document count ebfore auto committing helps.  Committing
isn't what flushes them out of memory, it is what makes the on-disk version
part of the overall index.  Over committing will slow you way down. 
Especially if you have any listeners on the commits doing a lot of work
(i.e. Solr distribution).

Also, if you are querying on the indexer that can eat memory and compete
with the memory you are trying to reserve for indexing.  So a split model of
indexing and querying on different instances lets you tune each the best;
but then you have a gap in time from indexing to querying as the trade-off.

It is hard to say what is going on with GC without knowing what garbage
collection settings you are passing to the VM, and what version of the Java
VM you are using.  Which garbage collector are you using and what tuning
parameters?

I tend to use Parallel GC on my indexers with GC Overhead limit turned off
allowing for some pauses (which users don't see on a back-end indexer) but
good GC with lower heap fragmentation.  I tend to use concurrent mark and
sweep GC on my query slaves with tuned incremental mode and pacing which is
a low pause collector taking advantage of the cores on my servers and can
incrementally keep up with the needs of a query slave.

-- Jayson


Gargate, Siddharth wrote:
> 
> Hi all,
>       I am also facing the same issue where autocommit blocks all
> other requests. I having around 1,00,000 documents with average size of
> 100K each. It took more than 20 hours to index. 
> I have currently set autocommit maxtime to 7 seconds, mergeFactor to 25.
> Do I need more configuration changes?
> Also I see that memory usage goes to peak level of heap specified(6 GB
> in my case). Looks like Solr spends most of the time in GC. 
> According to my understanding, fix for Solr-1155 would be that commit
> will run in background and new documents will be queued in the memory.
> But I am afraid of the memory consumption by this queue if commit takes
> much longer to complete.
> 
> Thanks,
> Siddharth
> 
> 
-- 
View this message in context: 
http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23540569.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to