I brought down the afflicted servers, waited 5 min, then brought them back up
very slowly. That fixed the problem. The bad shard was assigned a leader.
Great advice as usual.

Erick Erickson wrote
> Yes. If indexing went through you'd lose docs so indexing will fail.
> Querying will fail too unless you set shards.tolerant.
> 
> You really wouldn't want your docs lost is the reasoning.
> 
> On Feb 2, 2017 6:56 AM, "tedsolr" <

> tsmith@

> > wrote:
> 
>> Can I assume that without a leader the shard will not respond to write
>> requests? I can search on the collection. If I can't update docs or add
>> any
>> new docs then this becomes an emergency.
>>
>>
>> Erick Erickson wrote
>> > It's worth a try to take down your entire cluster. Bring one machine
>> > back up at a time. There _may_ be something like a 3 minute wait
>> > before each of the replicas on that machine come up, the leader
>> > election process has a 180 second delay before the replicas on that
>> > node take over leadership to wait for the last known good leader to
>> > come up.
>> >
>> > Continue bringing one node up at a time and wait patiently  until all
>> > the replicas on it are green and until you have a leader for each
>> > shard elected. Bringing up the rest of the Solr nodes should be
>> > quicker then.
>> >
>> > Be sure to sequence things so you have known good Solr nodes come up
>> > first for the shard that's wonky. By that I mean that the first node
>> > you bring up for the leaderless shard should be the one with the best
>> > chance of having a totally OK index.
>> >
>> >
>> > Let's claim that the above does bring up a leader for each shard. If
>> > you still have a replica that refuses to come up, use the
>> > DELETEREPLICA command to remove it. Just for insurance, I'd take the
>> > Solr node down after the DELETEREPLICA and remove the entire core
>> > directory for the replica that didn't come up. Then restart the node
>> > and use the ADDREPLICA collections API command to put it back.
>> >
>> > If none of that works, you could try hand-editing the state.json file
>> > and _make_ one of the shards a leader (I'd do this with the Solr nodes
>> > down), but that's not for the faint of heart.
>> >
>> > Best,
>> > Erick
>> >
>> > On Wed, Feb 1, 2017 at 1:57 PM, Jeff Wartes <
>>
>> > jwartes@
>>
>> > > wrote:
>> >> Sounds similar to a thread last year:
>> >> http://lucene.472066.n3.nabble.com/Node-not-
>> recovering-leader-elections-not-occuring-tp4287819p4287866.html
>> >>
>> >>
>> >>
>> >> On 2/1/17, 7:49 AM, "tedsolr" <
>>
>> > tsmith@
>>
>> > > wrote:
>> >>
>> >>     I have version 5.2.1. Short of an upgrade, are there any remedies?
>> >>
>> >>
>> >>     Erick Erickson wrote
>> >>     > What version of Solr? since 5.4 there's been a FORCELEADER
>> >> colelctions
>> >>     > API call that might help.
>> >>     >
>> >>     > I'd run it with the newly added replicas offline. you only want
>> it
>> >> to
>> >>     > have good replicas to choose from.
>> >>     >
>> >>     > Best,
>> >>     > Erick
>> >>     >
>> >>     > On Wed, Feb 1, 2017 at 6:48 AM, tedsolr <
>> >>
>> >>     > tsmith@
>> >>
>> >>     > > wrote:
>> >>     >> Update! I did find an error:
>> >>     >>
>> >>     >> 2017-02-01 09:23:22.673 ERROR org.apache.solr.common.
>> SolrException
>> >>     >> :org.apache.solr.common.SolrException: Error getting leader
>> from
>> >> zk for
>> >>     >> shard shard1
>> >>     >> ....
>> >>     >> Caused by: org.apache.solr.common.SolrException: Could not get
>> >> leader
>> >>     >> props
>> >>     >>         at
>> >>     >>
>> >> org.apache.solr.cloud.ZkController.getLeaderProps(
>> ZkController.java:1040)
>> >>     >>         at
>> >>     >>
>> >> org.apache.solr.cloud.ZkController.getLeaderProps(
>> ZkController.java:1004)
>> >>     >>         at
>> >>     >>
>> >> org.apache.solr.cloud.ZkController.getLeader(ZkController.java:960)
>> >>     >>         ... 14 more
>> >>     >> Caused by:
>> org.apache.zookeeper.KeeperException$NoNodeException:
>> >>     >> KeeperErrorCode = NoNode for
>> /collections/colname/leaders/shard1
>> >>     >>         at
>> >>     >>
>> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>> >>     >>
>> >>     >> When I view the cluster status I see that this shard does not
>> have
>> >> a
>> >>     >> leader.
>> >>     >> So it appears I need to force the leader designation to the
>> >> "active"
>> >>     >> replica. How do I do that?
>> >>     >>
>> >>     >>
>> >>     >>
>> >>     >> --
>> >>     >> View this message in context:
>> >>     >>
>> >> http://lucene.472066.n3.nabble.com/Collection-will-not-replicate-
>> tp4318260p4318265.html
>> >>     >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>     --
>> >>     View this message in context:
>> >> http://lucene.472066.n3.nabble.com/Collection-will-not-replicate-
>> tp4318260p4318283.html
>> >>     Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>>
>>
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.
>> nabble.com/Collection-will-not-replicate-tp4318260p4318479.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-will-not-replicate-tp4318260p4318639.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to