Thanks Alessandro.
We found this Jira ticket that may be the root cause of this issue:
https://issues.apache.org/jira/browse/SOLR-14356
I'm not sure whether it is the reason of the leader election initially
failing, but it prevents Solr from exiting this error loop.
Le mer. 13 janv. 2021 à 21:37
I faced these problems a while ago, but at the time I created a blog post
which I hope could help:
https://sease.io/2018/05/solrcloud-leader-election-failing.html
-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from
Sorry I missed this detail.
We are running Solr 8.2.
Thanks
Le mar. 12 janv. 2021 à 16:46, Phill Campbell
a écrit :
> Which version of Apache Solr?
>
> > On Jan 12, 2021, at 8:36 AM, Pierre Salagnac
> wrote:
> >
> > Hello,
> > We had a stuck leader elec
Which version of Apache Solr?
> On Jan 12, 2021, at 8:36 AM, Pierre Salagnac
> wrote:
>
> Hello,
> We had a stuck leader election for a shard.
>
> We have collections with 2 shards, each shard has 5 replicas. We have many
> collections but the issue happened for a sin
Salagnac
wrote:
>
> Hello,
> We had a stuck leader election for a shard.
>
> We have collections with 2 shards, each shard has 5 replicas. We have many
> collections but the issue happened for a single shard. Once all host
> restarts completed, this shard was stuck with one
Hello,
We had a stuck leader election for a shard.
We have collections with 2 shards, each shard has 5 replicas. We have many
collections but the issue happened for a single shard. Once all host
restarts completed, this shard was stuck with one replica is "recovery"
state and all othe
Are there any significant (or not so significant) changes? I have browsed the
release notes and searched JIRA, but the latest news seems to be in 7.3 (where
the old Leader-In-Recovery logic was replaced).
Context:
We are currently running Solr 7.4 in production. In the past year, we’ve seen
> Looking this up i found SOLR-5692, but that was solved a lifetime ago,
It wasn't.
https://issues.apache.org/jira/browse/SOLR-5692?focusedCommentId=14556876=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14556876
On Wed, Aug 21, 2019 at 1:29 PM Markus Jelsma
wrote:
Hello,
Looking this up i found SOLR-5692, but that was solved a lifetime ago, so just
checking if this is a familiar error and one i missing in Jira:
A client's Solr 8.2.0 cluster brought us the next StackOverflowError while
running 8.2.0 on Java 8:
Exception in thread
Using property legacyCloud=true, coreNodeNames are well written by Solr
in core.properties file.
We are wondering if the problem comes from our configuration or the
bugfix https://issues.apache.org/jira/browse/SOLR-11503 ?
_*Without legacyCloud=true:*_
> Our configuration before Solr
Hello,
We are trying to upgrade from Solr 6.6 to Solr 7.2.1 and we are using Solr
Cloud.
Doing some tests with 2 replicas, ZooKeeper doesn't know which one to elect as
a leader:
ERROR org.apache.solr.cloud.ZkController:getLeader:1206 - Error getting leader
from zk
Thanks Shawn. Yes I did index some docs after moving to 6.4.0. The release
notes did not mention anything about format being changed so I thought it
would be backward compatible. Yeah my only recourse is to re-index data.
Apart from that it was weird problems overall with 6.4.0. I was excited
On 2/2/2017 7:23 AM, Ravi Solr wrote:
> When i try to rollback from 6.4.0 to my original version of 6.0.1 it now
> throws another issue. Now I cant go to 6.4.0 nor can I roll back to 6.0.1
>
> Could not load codec 'Lucene62'. Did you forget to add
> lucene-backward-codecs.jar?
> at
Thanks Hendrik. Iam baffled as to why I did not hit this issue prior to
moving to 6.4.0.
On Thu, Feb 2, 2017 at 7:58 AM, Hendrik Haddorp
wrote:
> Might be that your overseer queue overloaded. Similar to what is described
> here:
>
When i try to rollback from 6.4.0 to my original version of 6.0.1 it now
throws another issue. Now I cant go to 6.4.0 nor can I roll back to 6.0.1
Could not load codec 'Lucene62'. Did you forget to add
lucene-backward-codecs.jar?
at
Might be that your overseer queue overloaded. Similar to what is
described here:
https://support.lucidworks.com/hc/en-us/articles/203959903-Bringing-up-downed-Solr-servers-that-don-t-want-to-come-up
If the overseer queue gets too long you get hit by this:
Following up on my previous email, the intermittent server unavailability
seems to be linked to the interaction between Solr and Zookeeper. Can
somebody help me understand what this error means and how to recover from
it.
2017-02-02 09:44:24.648 ERROR
Hello,
Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12
hours of debugging spree!! Can somebody kindly help me out of this misery.
I have a set has 8 single shard collections with 3 replicas. As soon as I
updated the configs and started the servers one of my collection got
like a leader
election issue?
2016-07-29 06:52:48.610 ERROR (coreZkRegister-1-thread-32-processing-s:shard2
x:tCollection_shard2_replica4 c:tCollection n:tsolr.prod2.xxx.com:8983_solr
r:core_node6) [c:tCollection s:shard2 r:core_node6
x:tCollection_shard2_replica4] o.a.s.c.ZkController Error
have seen
> whenever the one of the boxes is leader in solrcloud,the performance seems
> to be really good. However the leader election changes from time to time and
> most of the time the cloud boxes seem to process most of the traffic
> Currently our solrcloud looks something like this
>
to be really good. However the leader election changes from time to time and
most of the time the cloud boxes seem to process most of the traffic
Currently our solrcloud looks something like this
Physical Box 1
X ->shard 1 Clou
We get this error on one of our nodes:
Caused by: org.apache.solr.common.SolrException: There is conflicting
information about the leader of shard: shard2 our state says:
http://server01:8983/solr/collection/ but zookeeper says:
http://server02:8983/collection/
Then I noticed this in the log:
Hi,
I have 2 shards, 1 leader and 1 replica in each.
I've just removed a leader from one of the shards but the replica hasn't
become a leader yet.
How quickly should this normally happen?
tickTime=2000
dataDir=/home/rob/zoodata
clientPort=2181
initLimit=5
syncLimit=2
Thanks,
Rob
The leader election issue we were having was solved by passing
-Djava.net.preferIPv4Stack=true
to zookeeper startup script
It seems our Linux servers have IPv6 enabled but we have no IPv6 network.
Hope this helps others.
Arcadius.
On 4 September 2015 at 04:57, Arcadius Ahouansou <ar
Thanks for adding that to our collective knowledge store!
On Thu, Oct 22, 2015 at 2:44 AM, Arcadius Ahouansou
<arcad...@menelic.com> wrote:
> The leader election issue we were having was solved by passing
>
> -Djava.net.preferIPv4Stack=true
>
> to zookeeper startup script
&
We have a quorum of 3 ZK nodes zk1, zk2 and zk3.
All nodes are identicals.
After multiple restart of the ZK nodes, always keeping the majority of 2,
we have noticed that the node zk1 has never become the leader.
Only zk2 and zk3 become leader.
1) Is there any know reason or possible
, all my collections are down.
I look in the logs I can see problems of leader election, eg:
- Checking if I (core = test339_shard1_replica1, coreNodeName =
core_node5) shoulds try and be the leader.
- Cloud says we are still state leader.
I feel that all server pass the buck!
I do
look in the logs I can see problems of leader election, eg:
- Checking if I (core = test339_shard1_replica1, coreNodeName =
core_node5) shoulds try and be the leader.
- Cloud says we are still state leader.
I feel that all server pass the buck!
I do not understand this error especially
Hi Mike,
Yes, please open a new Jira issue and attach your patch there. We can
discuss more on the issue.
On Tue, Jul 28, 2015 at 11:40 AM, Michael Roberts mrobe...@tableau.com wrote:
Hey,
I am encountering an issue which looks a lot like
https://issues.apache.org/jira/browse/SOLR-6763.
Hey,
I am encountering an issue which looks a lot like
https://issues.apache.org/jira/browse/SOLR-6763.
However, it seems like the fix for that does not address the entire problem.
That fix will only work if we hit the zkClient.getChildren() call before the
reconnect logic has finished
Please, please, please do _not_ try to use core discovery to add new
replicas by manually editing stuff.
bq: and my deployment tools create an empty core on newly provisioned machines.
This is a really bad idea (as you have discovered). Basically, your
deployment tools have to do everything
Hi,
I have a SolrCloud setup, running 4.10.3. The setup consists of several cores,
each with a single shard and initially each shard has a single replica (so,
basically, one machine). I am using core discovery, and my deployment tools
create an empty core on newly provisioned machines.
The
Restarting the node cleared out the problem and everything recovered.
Thanks!
On 5/21/15 5:42 AM, Ramkumar R. Aiyengar wrote:
This shouldn't happen, but if it does, there's no good way currently for
Solr to automatically fix it. There are a couple of issues being worked on
to do that
This shouldn't happen, but if it does, there's no good way currently for
Solr to automatically fix it. There are a couple of issues being worked on
to do that currently. But till then, your best bet is to restart the node
which you expect to be the leader (you can look at ZK to see who is at the
My SolrCloud cluster isn't reassigning the collections leaders from
downed cores--the downed cores are still listed as the leaders. The
cluster has been in the state for a few hours and the logs continue to
report No registered leader was found after waiting for 4000ms. Is
there a way to force
Is it possible to elect the leader manually in SOLR Cloud 4.10.1?
-Sachin-
Not to my knowledge. There's quite a bit of work going on around
leader balancing, see the umbrella issue at
https://issues.apache.org/jira/browse/SOLR-6491.
That work won't quite do what you want in the sense that you can't say
nodeX you become the leader though. The way that set of operations
Thanks for the info. I will wait for the next release then. Will it come with
4.10.2?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Manual-leader-election-in-SolrCloud-tp4164047p4164115.html
Sent from the Solr - User mailing list archive at Nabble.com.
13, 2014 at 9:33 PM, sachinpkale sachinpk...@gmail.com wrote:
Thanks for the info. I will wait for the next release then. Will it come with
4.10.2?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Manual-leader-election-in-SolrCloud-tp4164047p4164115.html
Sent from
I see something similar where, given ~1000 shards, both nodes spend a LOT of
time sorting through the leader election process. Roughly 30 minutes.
I too am wondering - if I force all leaders onto one node, then shut down both,
then start up the node with all of the leaders on it first
We have to fix that then.
--
Mark Miller
about.me/markrmiller
On April 15, 2014 at 12:20:03 PM, Rich Mayfield (mayfield.r...@gmail.com) wrote:
I see something similar where, given ~1000 shards, both nodes spend a LOT of
time sorting through the leader election process. Roughly 30 minutes
the
leader.
Is there a way to force leader election for a shard for solrcloud? Is
there a way to break ties automatically (without restarting nodes) to make
a node as the leader for the shard?
Thanks
Nitin
No servers hosting this shard. To fix this, I either unload one
core or restart one of the nodes again so that one of them becomes the
leader.
Is there a way to force leader election for a shard for solrcloud? Is
there a way to break ties automatically (without restarting nodes) to make
where both the
replicas for a shard get into recovering state and never come up
causing
the error No servers hosting this shard. To fix this, I either unload
one
core or restart one of the nodes again so that one of them becomes the
leader.
Is there a way to force leader election
-dev-xen-06-vm-07.dev.dc1.kelkoo.net:8080/searchsolrnodefr/fr_green/
shard1
Is it a bug with the leader election ?
This problem does not occur :
- with the version 4.5.1.
- or if I start the four solr instances with a delay between them (about 15
seconds).
- or if I configure only one
-thread-2] INFO
org.apache.solr.cloud.ShardLeaderElectionContext:runLeaderProcess:251 - I am
the new leader:
http://dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:8080/searchsolrnodefr/fr_green/
shard1
Is it a bug with the leader election ?
This problem does not occur :
- with the version
)
No leader means we can't index data because a 503 http status code is
returned.
Is this the normal behaviour or a bug?
-
Best regards
--
View this message in context:
http://lucene.472066.n3.nabble.com/Leader-election-fails-in-some-point-tp4096514.html
Sent from the Solr - User mailing list
page, leader election didnt get triggered
for that collection.
http://lucene.472066.n3.nabble.com/file/n4086259/Screenshot.png
I couldnt able to index for that collection but i can able to search from
that collection.
Help me in this issue
Thanks in advance
Srivatsan
--
View this message
(AbstractUpdateRequest.java:117)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)/
after that i checked solr admin page, leader election didnt get triggered
for that collection.
http://lucene.472066
almost 15 minutes. After that i restarted the entire cluster. I am using solr
4.4 with 1 shard and 3 replicas
--
View this message in context:
http://lucene.472066.n3.nabble.com/Leader-election-tp4086259p4086287.html
Sent from the Solr - User mailing list archive at Nabble.com.
Any exceptions in the logs of other replicas. The default
leaderVoteWait time is 3 minutes after which a leader election should
have been initiated automatically.
On Fri, Aug 23, 2013 at 4:01 PM, Srivatsan ranjith.venkate...@gmail.com wrote:
almost 15 minutes. After that i restarted the entire
No exceptions. And leaderVoteWait value will be used only during startup rite
? A new leader will be elected once the leader node is down. Am i right ?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Leader-election-tp4086259p4086290.html
Sent from the Solr - User mailing
Yes, I have erased the tlog in replica 2 and it appears that the the first
replica's tlog was corrupted because of an ungracefull servlet shutdown.
There was no log for it unfortunately, neither the zookeeper log logged
anything about this. Is there a a place I could check in the zookeeper what
bq:why does it replicate all the index instead of copying just the
newer formed segments
because there's no guarantee that the segments are identical on the
nodes that make up a shard. The simplest way to conceptualize this
is to consider the autocommit settings on the servers Let's say
the hard
zookeeper
leader election - good state replicas (sub-cluster 1) replicated from empty
replicas (sub-cluster 2) ending up in removing all documents in these
shards!!
These are the logs from solr-prod32 (sub cluster #2 - bad state) - the
shard1_replica1 is elected to be leader although it was not before
of the
replicas that were in a replicating stage there was a wrong zookeeper
leader election - good state replicas (sub-cluster 1) replicated from empty
replicas (sub-cluster 2) ending up in removing all documents in these
shards!!
These are the logs from solr-prod32 (sub cluster #2 - bad state
discovered the disaster - on part of the
replicas that were in a replicating stage there was a wrong zookeeper
leader election - good state replicas (sub-cluster 1) replicated from empty
replicas (sub-cluster 2) ending up in removing all documents in these
shards!!
These are the logs from solr
?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Leader-Election-when-tp4077381.html
Sent from the Solr - User mailing list archive at Nabble.com.
. Shouldn't it distribute
leaders? If i deliver some stress to a double-leader instance, is Zookeeper
going to run an election?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Leader-Election-when-tp4077381.html
Sent from the Solr - User mailing list archive at Nabble.com.
deliver some stress to a double-leader instance, is Zookeeper
going to run an election?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Leader-Election-when-tp4077381.html
Sent from the Solr - User mailing list archive at Nabble.com.
https://issues.apache.org/jira/browse/SOLR-4900
--
View this message in context:
http://lucene.472066.n3.nabble.com/Leader-election-deadlock-after-restarting-leader-in-4-2-1-tp4067988p4068238.html
Sent from the Solr - User mailing list archive at Nabble.com.
and wait for a
new leader. This
still results in a few No registered leader was found exceptions, but at
least the duration is short.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Leader-election-deadlock-after-restarting-leader-in-4-2-1-tp4067988.html
Sent from the Solr - User
do a restart. We just stop the leader and wait for a
new leader. This
still results in a few No registered leader was found exceptions, but at
least the duration is short.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Leader-election-deadlock-after-restarting
)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrCloud-leader-election-on-single-node-tp4015804.html
Sent from the Solr - User mailing list archive at Nabble.com.
(ZkStateReader.java:399)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:318)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrCloud-leader-election-on-single-node
65 matches
Mail list logo