On 6/15/2018 3:14 PM, sujatha sankaran wrote:
We were initially having an issue with DBQ and heavy batch updates which
used to result in many missing updates.
After reading many mails in mailing list which mentions that DBQ and batch
update do not work well together, we switched to DBI. But we are seeing
issue as mentioned in this jira issue:
https://issues.apache.org/jira/browse/SOLR-7384
If you're using the implicit router on your multi-shard collection,
deleting by ID may not work for you. There are a number of issues in
Jira discussing various aspects of the problem. On a collection using
the compositeId router, I would expect those deletes to work well.
Specifically we are seeing a pattern as :-
· There are several ERRORs and WARNs about “missing _*version*_”
type of thing.
· ERROR message is typically single.
· There are several WARNs after that and after couple of WARNs there
is message that Leader initiated recovery has been kicked off .
Can you share these log entries? The message on some of them is
probably a dozen or more lines long, and may have multiple "Caused by"
clauses that will also need to be included. Seeing the whole log could
be useful.
*Setup info*:
- Solr Cloud 6.6.2
--5 Node, 5 Shard, 3 replica setup
-~35million docs in the collection
- Nodes have 90GB RAM 32 to JVM
-Soft commit interval 2 seconds, Hard commit (open searcher false) 15
seconds
Side notes:
Solr would actually have more heap memory available if you set the heap
to 31GB instead of 32GB.
https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/
A 2 second soft commit interval is extremely aggressive. If your soft
commits are happening really quickly (far less that 1 second) then this
might not be a problem, but with an index as large as yours, it is very
likely that soft commits are taking much longer than 2 seconds.
Thanks,
Shawn