bq. I've noticed that some replicas stop receiving updates from the leader without any visible signs from the cluster status.
Hmm, yes, this isn't expected at all. What are you seeing that causes you to say this? You'd have to be monitoring the log for update messages to the replicas that aren't leaders or the like. If anyone is going to have a prayer of reproducing we'll need more info on exactly what you're seeing and how you're measuring this. Have you changed any configurations in your replicas at all? We'd need the exact steps you performed if so. On a quick test I didn't see this, but if it were that easy to reproduce I'd expect it to have shown up before. NOTE: just looking at the cloud graph and having a node be active is not _necessarily_ sufficient for the node to be up to date. It _should_ be sufficient if (and only if) the node was shut down gracefully, but a "kill -9" or similar doesn't give the replicas on the node the opportunity to change the state. The "live_nodes" znode in ZooKeeper must also contain the node the replica resides on. If you see this state again, you could try pinging the node directly, does it respond? Your URL should look something like: http://host:port/solr/colection_shard1_replica_t1/query?q=*:*&distrib=false The "distrib=false" is important as it won't forward the query to any other replica. If what you're reporting is really happening, that node should respond with a document count different from other nodes. NOTE: there's a delay between the time the leader indexes a doc and it's visible on the follower. Are you sure you're waiting for leader_commit_interval+polling_interval+autowarm_time before concluding that there's a problem? I'm a bit suspicious that checking the versions is concluding that your indexes are out of sync when really they're just catching up normally. If it's at all possible to turn off indexing for a few minutes when this happens and everything just gets better then it's not really a problem. If we prove out that this is really happening as you think, then a JIRA (with steps to reproduce) is _definitely_ in order. Best, Erick On Wed, Oct 24, 2018 at 2:07 AM Vadim Ivanov <vadim.iva...@spb.ntk-intourist.ru> wrote: > > Hi All ! > > I'm testing Solr 7.5 with TLOG replicas on SolrCloud with 5 nodes. > > My collection has shards and every shard has 3 TLOG replicas on different > nodes. > > I've noticed that some replicas stop receiving updates from the leader > without any visible signs from the cluster status. > > (all replicas active and green in Admin UI CLOUD graph). But indexversion of > 'ill' replica not increasing with the leader. > > It seems to be dangerous, because that 'ill' replica could become a leader > after restart of the nodes and I already experienced data loss. > > I didn't notice any meaningfull records in solr log, except that probably > problem occurs when leader changes. > > Meanwhile, I monitor indexversion of all replicas in a cluster by mbeans and > recreate ill replicas when difference with the leader indexversion more > than one > > Any suggestions? > > -- > > Best regards, Vadim > > >