I brought down the afflicted servers, waited 5 min, then brought them back up very slowly. That fixed the problem. The bad shard was assigned a leader. Great advice as usual.
Erick Erickson wrote > Yes. If indexing went through you'd lose docs so indexing will fail. > Querying will fail too unless you set shards.tolerant. > > You really wouldn't want your docs lost is the reasoning. > > On Feb 2, 2017 6:56 AM, "tedsolr" < > tsmith@ > > wrote: > >> Can I assume that without a leader the shard will not respond to write >> requests? I can search on the collection. If I can't update docs or add >> any >> new docs then this becomes an emergency. >> >> >> Erick Erickson wrote >> > It's worth a try to take down your entire cluster. Bring one machine >> > back up at a time. There _may_ be something like a 3 minute wait >> > before each of the replicas on that machine come up, the leader >> > election process has a 180 second delay before the replicas on that >> > node take over leadership to wait for the last known good leader to >> > come up. >> > >> > Continue bringing one node up at a time and wait patiently until all >> > the replicas on it are green and until you have a leader for each >> > shard elected. Bringing up the rest of the Solr nodes should be >> > quicker then. >> > >> > Be sure to sequence things so you have known good Solr nodes come up >> > first for the shard that's wonky. By that I mean that the first node >> > you bring up for the leaderless shard should be the one with the best >> > chance of having a totally OK index. >> > >> > >> > Let's claim that the above does bring up a leader for each shard. If >> > you still have a replica that refuses to come up, use the >> > DELETEREPLICA command to remove it. Just for insurance, I'd take the >> > Solr node down after the DELETEREPLICA and remove the entire core >> > directory for the replica that didn't come up. Then restart the node >> > and use the ADDREPLICA collections API command to put it back. >> > >> > If none of that works, you could try hand-editing the state.json file >> > and _make_ one of the shards a leader (I'd do this with the Solr nodes >> > down), but that's not for the faint of heart. >> > >> > Best, >> > Erick >> > >> > On Wed, Feb 1, 2017 at 1:57 PM, Jeff Wartes < >> >> > jwartes@ >> >> > > wrote: >> >> Sounds similar to a thread last year: >> >> http://lucene.472066.n3.nabble.com/Node-not- >> recovering-leader-elections-not-occuring-tp4287819p4287866.html >> >> >> >> >> >> >> >> On 2/1/17, 7:49 AM, "tedsolr" < >> >> > tsmith@ >> >> > > wrote: >> >> >> >> I have version 5.2.1. Short of an upgrade, are there any remedies? >> >> >> >> >> >> Erick Erickson wrote >> >> > What version of Solr? since 5.4 there's been a FORCELEADER >> >> colelctions >> >> > API call that might help. >> >> > >> >> > I'd run it with the newly added replicas offline. you only want >> it >> >> to >> >> > have good replicas to choose from. >> >> > >> >> > Best, >> >> > Erick >> >> > >> >> > On Wed, Feb 1, 2017 at 6:48 AM, tedsolr < >> >> >> >> > tsmith@ >> >> >> >> > > wrote: >> >> >> Update! I did find an error: >> >> >> >> >> >> 2017-02-01 09:23:22.673 ERROR org.apache.solr.common. >> SolrException >> >> >> :org.apache.solr.common.SolrException: Error getting leader >> from >> >> zk for >> >> >> shard shard1 >> >> >> .... >> >> >> Caused by: org.apache.solr.common.SolrException: Could not get >> >> leader >> >> >> props >> >> >> at >> >> >> >> >> org.apache.solr.cloud.ZkController.getLeaderProps( >> ZkController.java:1040) >> >> >> at >> >> >> >> >> org.apache.solr.cloud.ZkController.getLeaderProps( >> ZkController.java:1004) >> >> >> at >> >> >> >> >> org.apache.solr.cloud.ZkController.getLeader(ZkController.java:960) >> >> >> ... 14 more >> >> >> Caused by: >> org.apache.zookeeper.KeeperException$NoNodeException: >> >> >> KeeperErrorCode = NoNode for >> /collections/colname/leaders/shard1 >> >> >> at >> >> >> >> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:111) >> >> >> >> >> >> When I view the cluster status I see that this shard does not >> have >> >> a >> >> >> leader. >> >> >> So it appears I need to force the leader designation to the >> >> "active" >> >> >> replica. How do I do that? >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> View this message in context: >> >> >> >> >> http://lucene.472066.n3.nabble.com/Collection-will-not-replicate- >> tp4318260p4318265.html >> >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> View this message in context: >> >> http://lucene.472066.n3.nabble.com/Collection-will-not-replicate- >> tp4318260p4318283.html >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >> >> >> >> >> >> -- >> View this message in context: http://lucene.472066.n3. >> nabble.com/Collection-will-not-replicate-tp4318260p4318479.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-will-not-replicate-tp4318260p4318639.html Sent from the Solr - User mailing list archive at Nabble.com.