FW: Replication error and Shard Inconsistencies..

Annette Newton Wed, 05 Dec 2012 05:55:49 -0800

Update:

I did a full restart of the solr cloud setup, stopped all the instances,
cleared down zookeeper and started them up individually.  I then removed the
index from one of the replicas, restarted solr and it replicated ok.  So I'm
wondering whether this is something that happens over a period of time.

Also just to let you know I changed the schema a couple of times and
reloaded the cores on all instances previous to the problem.  Don't know if
this could have contributed to the problem.

Thanks.

-----Original Message-----
From: Annette Newton [mailto:annette.new...@servicetick.com] 
Sent: 05 December 2012 09:04
To: solr-user@lucene.apache.org
Subject: RE: Replication error and Shard Inconsistencies..

Hi Mark,

Thanks so much for the reply.

We are using the release version of 4.0..

It's very strange replication appears to be underway, but no files are being
copied across.  I have attached both the log from the new node that I tried
to bring up and the Schema and config we are using.

I think it's probably something weird with our config, so I'm going to play
around with it today.  If I make any progress I'll send an update.

Thanks again.

-----Original Message-----
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: 05 December 2012 00:04
To: solr-user@lucene.apache.org
Subject: Re: Replication error and Shard Inconsistencies..

Hey Annette, 

Are you using Solr 4.0 final? A version of 4x or 5x?

Do you have the logs for when the replica tried to catch up to the leader?

Stopping and starting the node is actually a fine thing to do. Perhaps you
can try it again and capture the logs.

If a node is not listed as live but is in the clusterstate, that is fine. It
shouldn't be consulted. To remove it, you either have to unload it with the
core admin api or you could manually delete it's registered state under the
node states node that the Overseer looks at.

Also, it would be useful to see the logs of the new node coming up.there
should be info about what happens when it tries to replicate.

It almost sounds like replication is just not working for your setup at all
and that you have to tweak some configuration. You shouldn't see these nodes
as active then though - so we should get to the bottom of this.

- Mark

On Dec 4, 2012, at 4:37 AM, Annette Newton <annette.new...@servicetick.com>
wrote:

> Hi all,
>  
> I have a quite weird issue with Solr cloud.  I have a 4 shard, 2 
> replica
setup, yesterday one of the nodes lost communication with the cloud setup,
which resulted in it trying to run replication, this failed, which has left
me with a Shard (Shard 4) that has one node with 2,833,940 documents on the
leader and 409,837 on the follower - obviously a big discrepancy and this
leads to queries returning differing results depending on which of these
nodes it gets the data from.  There is no indication of a problem on the
admin site other than the big discrepancy in the number of documents.  They
are all marked as active etc.
>  
> So I thought that I would force replication to happen again, by 
> stopping
and starting solr (probably the wrong thing to do) but this resulted in no
change.  So I turned off that node and replaced it with a new one.  In
zookeeper live nodes doesn't list that machine but it is still being shown
as active on in the ClusterState.json, I have attached images showing this.
This means the new node hasn't replaced the old node but is now a replica on
Shard 1!  Also that node doesn't appear to have replicated Shard 1's data
anyway, it didn't get marked with replicating or anything. 
>  
> How do I clear the zookeeper state without taking down the entire solr
cloud setup?  How do I force a node to replicate from the others in the
shard?
>  
> Thanks in advance.
>  
> Annette Newton
>  
>  
> <LiveNodes.zip>

FW: Replication error and Shard Inconsistencies..

Reply via email to