Re: leader election stuck after hosts restarts

2021-01-22 Thread Pierre Salagnac
Thanks Alessandro. We found this Jira ticket that may be the root cause of this issue: https://issues.apache.org/jira/browse/SOLR-14356 I'm not sure whether it is the reason of the leader election initially failing, but it prevents Solr from exiting this error loop. Le mer. 13 janv. 2021 à 21:37

Re: leader election stuck after hosts restarts

2021-01-13 Thread Alessandro Benedetti
I faced these problems a while ago, but at the time I created a blog post which I hope could help: https://sease.io/2018/05/solrcloud-leader-election-failing.html - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from

Re: leader election stuck after hosts restarts

2021-01-12 Thread Pierre Salagnac
Sorry I missed this detail. We are running Solr 8.2. Thanks Le mar. 12 janv. 2021 à 16:46, Phill Campbell a écrit : > Which version of Apache Solr? > > > On Jan 12, 2021, at 8:36 AM, Pierre Salagnac > wrote: > > > > Hello, > > We had a stuck leader elec

Re: leader election stuck after hosts restarts

2021-01-12 Thread Phill Campbell
Which version of Apache Solr? > On Jan 12, 2021, at 8:36 AM, Pierre Salagnac > wrote: > > Hello, > We had a stuck leader election for a shard. > > We have collections with 2 shards, each shard has 5 replicas. We have many > collections but the issue happened for a sin

Re: leader election stuck after hosts restarts

2021-01-12 Thread matthew sporleder
Salagnac wrote: > > Hello, > We had a stuck leader election for a shard. > > We have collections with 2 shards, each shard has 5 replicas. We have many > collections but the issue happened for a single shard. Once all host > restarts completed, this shard was stuck with one

leader election stuck after hosts restarts

2021-01-12 Thread Pierre Salagnac
Hello, We had a stuck leader election for a shard. We have collections with 2 shards, each shard has 5 replicas. We have many collections but the issue happened for a single shard. Once all host restarts completed, this shard was stuck with one replica is "recovery" state and all othe

Solr8 improvements to SolrCloud leader election

2020-06-02 Thread Danny Shih
Are there any significant (or not so significant) changes? I have browsed the release notes and searched JIRA, but the latest news seems to be in 7.3 (where the old Leader-In-Recovery logic was replaced). Context: We are currently running Solr 7.4 in production. In the past year, we’ve seen

Re: StackOverflowError leader election on 8.2.0

2019-08-21 Thread Mikhail Khludnev
> Looking this up i found SOLR-5692, but that was solved a lifetime ago, It wasn't. https://issues.apache.org/jira/browse/SOLR-5692?focusedCommentId=14556876=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14556876 On Wed, Aug 21, 2019 at 1:29 PM Markus Jelsma wrote:

StackOverflowError leader election on 8.2.0

2019-08-21 Thread Markus Jelsma
Hello, Looking this up i found SOLR-5692, but that was solved a lifetime ago, so just checking if this is a familiar error and one i missing in Jira: A client's Solr 8.2.0 cluster brought us the next StackOverflowError while running 8.2.0 on Java 8: Exception in thread

Re: SolrCloud 7.2 problem with leader election

2018-04-04 Thread Gael Jourdan-Weil
Using property legacyCloud=true, coreNodeNames are well written by Solr in core.properties file. We are wondering if the problem comes from our configuration or the bugfix https://issues.apache.org/jira/browse/SOLR-11503 ? _*Without legacyCloud=true:*_ > Our configuration before Solr

SolrCloud 7.2 problem with leader election

2018-04-03 Thread Gael Jourdan-Weil
Hello, We are trying to upgrade from Solr 6.6 to Solr 7.2.1 and we are using Solr Cloud. Doing some tests with 2 replicas, ZooKeeper doesn't know which one to elect as a leader: ERROR org.apache.solr.cloud.ZkController:getLeader:1206 - Error getting leader from zk

Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Ravi Solr
Thanks Shawn. Yes I did index some docs after moving to 6.4.0. The release notes did not mention anything about format being changed so I thought it would be backward compatible. Yeah my only recourse is to re-index data. Apart from that it was weird problems overall with 6.4.0. I was excited

Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Shawn Heisey
On 2/2/2017 7:23 AM, Ravi Solr wrote: > When i try to rollback from 6.4.0 to my original version of 6.0.1 it now > throws another issue. Now I cant go to 6.4.0 nor can I roll back to 6.0.1 > > Could not load codec 'Lucene62'. Did you forget to add > lucene-backward-codecs.jar? > at

Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Ravi Solr
Thanks Hendrik. Iam baffled as to why I did not hit this issue prior to moving to 6.4.0. On Thu, Feb 2, 2017 at 7:58 AM, Hendrik Haddorp wrote: > Might be that your overseer queue overloaded. Similar to what is described > here: >

Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Ravi Solr
When i try to rollback from 6.4.0 to my original version of 6.0.1 it now throws another issue. Now I cant go to 6.4.0 nor can I roll back to 6.0.1 Could not load codec 'Lucene62'. Did you forget to add lucene-backward-codecs.jar? at

Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Hendrik Haddorp
Might be that your overseer queue overloaded. Similar to what is described here: https://support.lucidworks.com/hc/en-us/articles/203959903-Bringing-up-downed-Solr-servers-that-don-t-want-to-come-up If the overseer queue gets too long you get hit by this:

Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Ravi Solr
Following up on my previous email, the intermittent server unavailability seems to be linked to the interaction between Solr and Zookeeper. Can somebody help me understand what this error means and how to recover from it. 2017-02-02 09:44:24.648 ERROR

6.4.0 collection leader election and recovery issues

2017-02-01 Thread Ravi Solr
Hello, Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12 hours of debugging spree!! Can somebody kindly help me out of this misery. I have a set has 8 single shard collections with 3 replicas. As soon as I updated the configs and started the servers one of my collection got

Collection going to recovery mode - Leader election issue?

2016-08-02 Thread Aswath Srinivasan (TMS)
like a leader election issue? 2016-07-29 06:52:48.610 ERROR (coreZkRegister-1-thread-32-processing-s:shard2 x:tCollection_shard2_replica4 c:tCollection n:tsolr.prod2.xxx.com:8983_solr r:core_node6) [c:tCollection s:shard2 r:core_node6 x:tCollection_shard2_replica4] o.a.s.c.ZkController Error

Re: Overriding SolrCloud Leader Election and manually assign leadership?-Is it possible?

2016-03-24 Thread Erick Erickson
have seen > whenever the one of the boxes is leader in solrcloud,the performance seems > to be really good. However the leader election changes from time to time and > most of the time the cloud boxes seem to process most of the traffic > Currently our solrcloud looks something like this >

Overriding SolrCloud Leader Election and manually assign leadership?-Is it possible?

2016-03-24 Thread ram
to be really good. However the leader election changes from time to time and most of the time the cloud boxes seem to process most of the traffic Currently our solrcloud looks something like this Physical Box 1 X ->shard 1 Clou

Leader election issues after upgrade from 4.10.4 to 5.4.1

2016-02-08 Thread Mike Thomsen
We get this error on one of our nodes: Caused by: org.apache.solr.common.SolrException: There is conflicting information about the leader of shard: shard2 our state says: http://server01:8983/solr/collection/ but zookeeper says: http://server02:8983/collection/ Then I noticed this in the log:

Leader Election Time

2016-01-15 Thread Robert Brown
Hi, I have 2 shards, 1 leader and 1 replica in each. I've just removed a leader from one of the shards but the replica hasn't become a leader yet. How quickly should this normally happen? tickTime=2000 dataDir=/home/rob/zoodata clientPort=2181 initLimit=5 syncLimit=2 Thanks, Rob

Re: Zookeeper Quorum leader election

2015-10-22 Thread Arcadius Ahouansou
The leader election issue we were having was solved by passing -Djava.net.preferIPv4Stack=true to zookeeper startup script It seems our Linux servers have IPv6 enabled but we have no IPv6 network. Hope this helps others. Arcadius. On 4 September 2015 at 04:57, Arcadius Ahouansou <ar

Re: Zookeeper Quorum leader election

2015-10-22 Thread Erick Erickson
Thanks for adding that to our collective knowledge store! On Thu, Oct 22, 2015 at 2:44 AM, Arcadius Ahouansou <arcad...@menelic.com> wrote: > The leader election issue we were having was solved by passing > > -Djava.net.preferIPv4Stack=true > > to zookeeper startup script &

Zookeeper Quorum leader election

2015-09-03 Thread Arcadius Ahouansou
We have a quorum of 3 ZK nodes zk1, zk2 and zk3. All nodes are identicals. After multiple restart of the ZK nodes, always keeping the majority of 2, we have noticed that the node zk1 has never become the leader. Only zk2 and zk3 become leader. 1) Is there any know reason or possible

Leader election

2015-07-29 Thread Olivier Damiot
, all my collections are down. I look in the logs I can see problems of leader election, eg: - Checking if I (core = test339_shard1_replica1, coreNodeName = core_node5) shoulds try and be the leader. - Cloud says we are still state leader. I feel that all server pass the buck! I do

Re: Leader election

2015-07-29 Thread Timothy Potter
look in the logs I can see problems of leader election, eg: - Checking if I (core = test339_shard1_replica1, coreNodeName = core_node5) shoulds try and be the leader. - Cloud says we are still state leader. I feel that all server pass the buck! I do not understand this error especially

Re: Issue when zookeeper session expires during shard leader election.

2015-07-28 Thread Shalin Shekhar Mangar
Hi Mike, Yes, please open a new Jira issue and attach your patch there. We can discuss more on the issue. On Tue, Jul 28, 2015 at 11:40 AM, Michael Roberts mrobe...@tableau.com wrote: Hey, I am encountering an issue which looks a lot like https://issues.apache.org/jira/browse/SOLR-6763.

Issue when zookeeper session expires during shard leader election.

2015-07-28 Thread Michael Roberts
Hey, I am encountering an issue which looks a lot like https://issues.apache.org/jira/browse/SOLR-6763. However, it seems like the fix for that does not address the entire problem. That fix will only work if we hit the zkClient.getChildren() call before the reconnect logic has finished

Re: Sync failure after shard leader election when adding new replica.

2015-05-26 Thread Erick Erickson
Please, please, please do _not_ try to use core discovery to add new replicas by manually editing stuff. bq: and my deployment tools create an empty core on newly provisioned machines. This is a really bad idea (as you have discovered). Basically, your deployment tools have to do everything

Sync failure after shard leader election when adding new replica.

2015-05-26 Thread Michael Roberts
Hi, I have a SolrCloud setup, running 4.10.3. The setup consists of several cores, each with a single shard and initially each shard has a single replica (so, basically, one machine). I am using core discovery, and my deployment tools create an empty core on newly provisioned machines. The

Re: SolrCloud Leader Election

2015-05-22 Thread Ryan Steele
Restarting the node cleared out the problem and everything recovered. Thanks! On 5/21/15 5:42 AM, Ramkumar R. Aiyengar wrote: This shouldn't happen, but if it does, there's no good way currently for Solr to automatically fix it. There are a couple of issues being worked on to do that

Re: SolrCloud Leader Election

2015-05-21 Thread Ramkumar R. Aiyengar
This shouldn't happen, but if it does, there's no good way currently for Solr to automatically fix it. There are a couple of issues being worked on to do that currently. But till then, your best bet is to restart the node which you expect to be the leader (you can look at ZK to see who is at the

SolrCloud Leader Election

2015-05-20 Thread Ryan Steele
My SolrCloud cluster isn't reassigning the collections leaders from downed cores--the downed cores are still listed as the leaders. The cluster has been in the state for a few hours and the logs continue to report No registered leader was found after waiting for 4000ms. Is there a way to force

Manual leader election in SolrCloud

2014-10-13 Thread Sachin Kale
Is it possible to elect the leader manually in SOLR Cloud 4.10.1? -Sachin-

Re: Manual leader election in SolrCloud

2014-10-13 Thread Erick Erickson
Not to my knowledge. There's quite a bit of work going on around leader balancing, see the umbrella issue at https://issues.apache.org/jira/browse/SOLR-6491. That work won't quite do what you want in the sense that you can't say nodeX you become the leader though. The way that set of operations

Re: Manual leader election in SolrCloud

2014-10-13 Thread sachinpkale
Thanks for the info. I will wait for the next release then. Will it come with 4.10.2? -- View this message in context: http://lucene.472066.n3.nabble.com/Manual-leader-election-in-SolrCloud-tp4164047p4164115.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Manual leader election in SolrCloud

2014-10-13 Thread Erick Erickson
13, 2014 at 9:33 PM, sachinpkale sachinpk...@gmail.com wrote: Thanks for the info. I will wait for the next release then. Will it come with 4.10.2? -- View this message in context: http://lucene.472066.n3.nabble.com/Manual-leader-election-in-SolrCloud-tp4164047p4164115.html Sent from

Race condition in Leader Election

2014-04-15 Thread Rich Mayfield
I see something similar where, given ~1000 shards, both nodes spend a LOT of time sorting through the leader election process. Roughly 30 minutes. I too am wondering - if I force all leaders onto one node, then shut down both, then start up the node with all of the leaders on it first

Re: Race condition in Leader Election

2014-04-15 Thread Mark Miller
We have to fix that then. --  Mark Miller about.me/markrmiller On April 15, 2014 at 12:20:03 PM, Rich Mayfield (mayfield.r...@gmail.com) wrote: I see something similar where, given ~1000 shards, both nodes spend a LOT of time sorting through the leader election process. Roughly 30 minutes

Race condition in Leader Election

2014-03-06 Thread KNitin
the leader. Is there a way to force leader election for a shard for solrcloud? Is there a way to break ties automatically (without restarting nodes) to make a node as the leader for the shard? Thanks Nitin

Re: Race condition in Leader Election

2014-03-06 Thread Mark Miller
No servers hosting this shard. To fix this, I either unload one core or restart one of the nodes again so that one of them becomes the leader. Is there a way to force leader election for a shard for solrcloud? Is there a way to break ties automatically (without restarting nodes) to make

Re: Race condition in Leader Election

2014-03-06 Thread KNitin
where both the replicas for a shard get into recovering state and never come up causing the error No servers hosting this shard. To fix this, I either unload one core or restart one of the nodes again so that one of them becomes the leader. Is there a way to force leader election

SolrCloud 4.6.0 - leader election issue

2013-12-09 Thread Elodie Sannier
-dev-xen-06-vm-07.dev.dc1.kelkoo.net:8080/searchsolrnodefr/fr_green/ shard1 Is it a bug with the leader election ? This problem does not occur : - with the version 4.5.1. - or if I start the four solr instances with a delay between them (about 15 seconds). - or if I configure only one

RE: SolrCloud 4.6.0 - leader election issue

2013-12-09 Thread Markus Jelsma
-thread-2] INFO org.apache.solr.cloud.ShardLeaderElectionContext:runLeaderProcess:251 - I am the new leader: http://dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:8080/searchsolrnodefr/fr_green/ shard1 Is it a bug with the leader election ? This problem does not occur : - with the version

Leader election fails in some point.

2013-10-18 Thread yriveiro
) No leader means we can't index data because a 503 http status code is returned. Is this the normal behaviour or a bug? - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Leader-election-fails-in-some-point-tp4096514.html Sent from the Solr - User mailing list

Leader election

2013-08-23 Thread Srivatsan
page, leader election didnt get triggered for that collection. http://lucene.472066.n3.nabble.com/file/n4086259/Screenshot.png I couldnt able to index for that collection but i can able to search from that collection. Help me in this issue Thanks in advance Srivatsan -- View this message

Re: Leader election

2013-08-23 Thread Shalin Shekhar Mangar
(AbstractUpdateRequest.java:117) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)/ after that i checked solr admin page, leader election didnt get triggered for that collection. http://lucene.472066

Re: Leader election

2013-08-23 Thread Srivatsan
almost 15 minutes. After that i restarted the entire cluster. I am using solr 4.4 with 1 shard and 3 replicas -- View this message in context: http://lucene.472066.n3.nabble.com/Leader-election-tp4086259p4086287.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Leader election

2013-08-23 Thread Shalin Shekhar Mangar
Any exceptions in the logs of other replicas. The default leaderVoteWait time is 3 minutes after which a leader election should have been initiated automatically. On Fri, Aug 23, 2013 at 4:01 PM, Srivatsan ranjith.venkate...@gmail.com wrote: almost 15 minutes. After that i restarted the entire

Re: Leader election

2013-08-23 Thread Srivatsan
No exceptions. And leaderVoteWait value will be used only during startup rite ? A new leader will be elected once the leader node is down. Am i right ? -- View this message in context: http://lucene.472066.n3.nabble.com/Leader-election-tp4086259p4086290.html Sent from the Solr - User mailing

Re: Wrong leader election leads to shard removal

2013-08-16 Thread Ido Kissos
Yes, I have erased the tlog in replica 2 and it appears that the the first replica's tlog was corrupted because of an ungracefull servlet shutdown. There was no log for it unfortunately, neither the zookeeper log logged anything about this. Is there a a place I could check in the zookeeper what

Re: Wrong leader election leads to shard removal

2013-08-16 Thread Erick Erickson
bq:why does it replicate all the index instead of copying just the newer formed segments because there's no guarantee that the segments are identical on the nodes that make up a shard. The simplest way to conceptualize this is to consider the autocommit settings on the servers Let's say the hard

Wrong leader election leads to shard removal

2013-08-14 Thread Manuel Le Normand
zookeeper leader election - good state replicas (sub-cluster 1) replicated from empty replicas (sub-cluster 2) ending up in removing all documents in these shards!! These are the logs from solr-prod32 (sub cluster #2 - bad state) - the shard1_replica1 is elected to be leader although it was not before

Re: Wrong leader election leads to shard removal

2013-08-14 Thread Manuel Le Normand
of the replicas that were in a replicating stage there was a wrong zookeeper leader election - good state replicas (sub-cluster 1) replicated from empty replicas (sub-cluster 2) ending up in removing all documents in these shards!! These are the logs from solr-prod32 (sub cluster #2 - bad state

Re: Wrong leader election leads to shard removal

2013-08-14 Thread Mark Miller
discovered the disaster - on part of the replicas that were in a replicating stage there was a wrong zookeeper leader election - good state replicas (sub-cluster 1) replicated from empty replicas (sub-cluster 2) ending up in removing all documents in these shards!! These are the logs from solr

Re: Leader Election, when?

2013-07-12 Thread Furkan KAMACI
? -- View this message in context: http://lucene.472066.n3.nabble.com/Leader-Election-when-tp4077381.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Leader Election, when?

2013-07-12 Thread Erick Erickson
. Shouldn't it distribute leaders? If i deliver some stress to a double-leader instance, is Zookeeper going to run an election? -- View this message in context: http://lucene.472066.n3.nabble.com/Leader-Election-when-tp4077381.html Sent from the Solr - User mailing list archive at Nabble.com.

Leader Election, when?

2013-07-11 Thread aabreur
deliver some stress to a double-leader instance, is Zookeeper going to run an election? -- View this message in context: http://lucene.472066.n3.nabble.com/Leader-Election-when-tp4077381.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Leader election deadlock after restarting leader in 4.2.1

2013-06-04 Thread John Guerrero
https://issues.apache.org/jira/browse/SOLR-4900 -- View this message in context: http://lucene.472066.n3.nabble.com/Leader-election-deadlock-after-restarting-leader-in-4-2-1-tp4067988p4068238.html Sent from the Solr - User mailing list archive at Nabble.com.

Leader election deadlock after restarting leader in 4.2.1

2013-06-03 Thread John Guerrero
and wait for a new leader. This still results in a few No registered leader was found exceptions, but at least the duration is short. -- View this message in context: http://lucene.472066.n3.nabble.com/Leader-election-deadlock-after-restarting-leader-in-4-2-1-tp4067988.html Sent from the Solr - User

Re: Leader election deadlock after restarting leader in 4.2.1

2013-06-03 Thread Mark Miller
do a restart. We just stop the leader and wait for a new leader. This still results in a few No registered leader was found exceptions, but at least the duration is short. -- View this message in context: http://lucene.472066.n3.nabble.com/Leader-election-deadlock-after-restarting

SolrCloud leader election on single node

2012-10-25 Thread AlexeyK
) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220) -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-leader-election-on-single-node-tp4015804.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud leader election on single node

2012-10-25 Thread Mark Miller
(ZkStateReader.java:399) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:318) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220) -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-leader-election-on-single-node