Thanks,Shawn. Very useful information. Please find below the log details:-
2018-06-20 17:19:06.661 ERROR (updateExecutor-2-thread-8226-processing-crm_v2_01_shard3_replica1 x:crm_v2_01_shard3_replica2 r:core_node4 n:masked:8983_solr s:shard3 c:crm_v2_01) [c:crm_v2_01 s:shard3 r:core_node4 x:crm_v2_01_shard3_replica2] o.a.s.u.StreamingSolrClients error org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at crm_v2_01_shard3_replica1: Bad Request request: crm_v2_01_shard3_replica1/update?update.chain=add-unknown-fields-to-the-schema&update.distrib=FROMLEADER&distrib.from=http%3A%2F%2Fmasked%3A8983%2Fsolr%2Fcrm_v2_01_shard3_replica2%2F&wt=javabin&version=2 Remote error message: missing _version_ on update from leader at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:345) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184) at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.dt_access$292(ExecutorUtil.java) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2018-06-20 17:19:06.662 WARN (qtp1002191352-169102) [c:crm_v2_01 s:shard3 r:core_node4 x:crm_v2_01_shard3_replica2] o.a.s.u.p.DistributedUpdateProcessor Error sending update to http://masked:8983/solr org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://masked:8983/solr/crm_v2_01_shard3_replica3: Bad Request request: http://masked:8983/solr/crm_v2_01_shard3_replica3/update?update.chain=add-unknown-fields-to-the-schema&update.distrib=FROMLEADER&distrib.from=http%3A%2F%2Fmasked%3A8983%2Fsolr%2Fcrm_v2_01_shard3_replica2%2F&wt=javabin&version=2 Remote error message: missing _version_ on update from leader at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:345) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184) at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.dt_access$292(ExecutorUtil.java) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2018-06-20 17:19:06.662 ERROR (qtp1002191352-169102) [c:crm_v2_01 s:shard3 r:core_node4 x:crm_v2_01_shard3_replica2] o.a.s.u.p.DistributedUpdateProcessor Setting up to try to start recovery on replica http://masked:8983/solr/crm_v2_01_shard3_replica3/ org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://masked:8983/solr/crm_v2_01_shard3_replica3: Bad Request request: http://masked:8983/solr/crm_v2_01_shard3_replica3/update?update.chain=add-unknown-fields-to-the-schema&update.distrib=FROMLEADER&distrib.from=http%3A%2F%2Fmasked%3A8983%2Fsolr%2Fcrm_v2_01_shard3_replica2%2F&wt=javabin&version=2 Remote error message: missing _version_ on update from leader at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:345) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184) at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.dt_access$292(ExecutorUtil.java) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2018-06-20 17:19:06.662 INFO (qtp1002191352-169102) [c:crm_v2_01 s:shard3 r:core_node4 x:crm_v2_01_shard3_replica2] o.a.s.c.ZkController Put replica core=crm_v2_01_shard3_replica3 coreNodeName=core_node12 on masked:8983_solr into leader-initiated recovery. 2018-06-20 17:19:06.662 WARN (qtp1002191352-169102) [c:crm_v2_01 s:shard3 r:core_node4 x:crm_v2_01_shard3_replica2] o.a.s.u.p.DistributedUpdateProcessor Error sending update to http://masked:8983/solr org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at crm_v2_01_shard3_replica1: Bad Request request: crm_v2_01_shard3_replica1/update?update.chain=add-unknown-fields-to-the-schema&update.distrib=FROMLEADER&distrib.from=http%3A%2F%2Fmasked%3A8983%2Fsolr%2Fcrm_v2_01_shard3_replica2%2F&wt=javabin&version=2 Remote error message: missing _version_ on update from leader at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:345) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184) at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.dt_access$292(ExecutorUtil.java) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2018-06-20 17:19:06.663 ERROR (qtp1002191352-169102) [c:crm_v2_01 s:shard3 r:core_node4 x:crm_v2_01_shard3_replica2] o.a.s.u.p.DistributedUpdateProcessor Setting up to try to start recovery on replica crm_v2_01_shard3_replica1/ org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at crm_v2_01_shard3_replica1: Bad Request request: crm_v2_01_shard3_replica1/update?update.chain=add-unknown-fields-to-the-schema&update.distrib=FROMLEADER&distrib.from=http%3A%2F%2Fmasked%3A8983%2Fsolr%2Fcrm_v2_01_shard3_replica2%2F&wt=javabin&version=2 Remote error message: missing _version_ on update from leader at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:345) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184) at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.dt_access$292(ExecutorUtil.java) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2018-06-20 17:19:06.663 INFO (qtp1002191352-169102) [c:crm_v2_01 s:shard3 r:core_node4 x:crm_v2_01_shard3_replica2] o.a.s.c.ZkController Put replica core=crm_v2_01_shard3_replica1 coreNodeName=core_node13 on masked:8983_solr into leader-initiated recovery. 2018-06-20 17:19:06.663 INFO (qtp1002191352-169102) [c:crm_v2_01 s:shard3 r:core_node4 x:crm_v2_01_shard3_replica2] o.a.s.u.p.LogUpdateProcessorFactory [crm_v2_01_shard3_replica2] webapp=/solr path=/update params={update.distrib=TOLEADER&update.chain=add-unknown-fields-to-the-schema&distrib.from=http://masked:8983/solr/crm_v2_01_shard3_replica3/&wt=javabin&version=2}{delete=[note-20151333-8M821761N (-1603827973916459008)]} 0 4 2018-06-20 17:19:06.668 INFO (updateExecutor-2-thread-8226-processing-x:crm_v2_01_shard3_replica2 r:core_node4 crm_v2_01_shard3_replica3// n:masked:8983_solr s:shard3 c:crm_v2_01) [c:crm_v2_01 s:shard3 r:core_node4 x:crm_v2_01_shard3_replica2] o.a.s.c.LeaderInitiatedRecoveryThread Put replica core=crm_v2_01_shard3_replica3 coreNodeName=core_node12 on masked:8983_solr into leader-initiated recovery. 2018-06-20 17:19:06.668 WARN (updateExecutor-2-thread-8226-processing-x:crm_v2_01_shard3_replica2 r:core_node4 crm_v2_01_shard3_replica3// n:masked:8983_solr s:shard3 c:crm_v2_01) [c:crm_v2_01 s:shard3 r:core_node4 x:crm_v2_01_shard3_replica2] o.a.s.c.LeaderInitiatedRecoveryThread Leader is publishing core=crm_v2_01_shard3_replica3 coreNodeName =core_node12 state=down on behalf of un-reachable replica http://masked:8983/solr/crm_v2_01_shard3_replica3/ Thanks, Sujatha On Wed, Jun 20, 2018 at 11:18 AM, Shawn Heisey <apa...@elyograg.org> wrote: > On 6/15/2018 3:14 PM, sujatha sankaran wrote: > >> We were initially having an issue with DBQ and heavy batch updates which >> used to result in many missing updates. >> >> After reading many mails in mailing list which mentions that DBQ and batch >> update do not work well together, we switched to DBI. But we are seeing >> issue as mentioned in this jira issue: >> https://issues.apache.org/jira/browse/SOLR-7384 >> > > If you're using the implicit router on your multi-shard collection, > deleting by ID may not work for you. There are a number of issues in Jira > discussing various aspects of the problem. On a collection using the > compositeId router, I would expect those deletes to work well. > > Specifically we are seeing a pattern as :- >> >> · There are several ERRORs and WARNs about “missing _*version*_” >> type of thing. >> >> · ERROR message is typically single. >> >> · There are several WARNs after that and after couple of WARNs >> there >> is message that Leader initiated recovery has been kicked off . >> > > Can you share these log entries? The message on some of them is probably > a dozen or more lines long, and may have multiple "Caused by" clauses that > will also need to be included. Seeing the whole log could be useful. > > *Setup info*: >> >> - Solr Cloud 6.6.2 >> --5 Node, 5 Shard, 3 replica setup >> -~35million docs in the collection >> - Nodes have 90GB RAM 32 to JVM >> -Soft commit interval 2 seconds, Hard commit (open searcher false) 15 >> seconds >> > > Side notes: > > Solr would actually have more heap memory available if you set the heap to > 31GB instead of 32GB. > > https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb- > java-jvm-memory-oddities/ > > A 2 second soft commit interval is extremely aggressive. If your soft > commits are happening really quickly (far less that 1 second) then this > might not be a problem, but with an index as large as yours, it is very > likely that soft commits are taking much longer than 2 seconds. > > Thanks, > Shawn > >