Not sure but, maybe you are running out of file descriptors ?
On each solr instance, look at the "dashboard" admin page, there is a
bar with "File Descriptor Count".

However if this was the case, I would expect to see lots of errors in
the solr logs...

André


On 12/05/2012 06:41 PM, Annette Newton wrote:
Sorry to bombard you - final update of the day...

One thing that I have noticed is that we have a lot of connections between
the solr boxes with the connection set to CLOSE_WAIT and they hang around
for ages.

-----Original Message-----
From: Annette Newton [mailto:annette.new...@servicetick.com]
Sent: 05 December 2012 13:55
To: solr-user@lucene.apache.org
Subject: FW: Replication error and Shard Inconsistencies..

Update:

I did a full restart of the solr cloud setup, stopped all the instances,
cleared down zookeeper and started them up individually.  I then removed the
index from one of the replicas, restarted solr and it replicated ok.  So I'm
wondering whether this is something that happens over a period of time.

Also just to let you know I changed the schema a couple of times and
reloaded the cores on all instances previous to the problem.  Don't know if
this could have contributed to the problem.

Thanks.

-----Original Message-----
From: Annette Newton [mailto:annette.new...@servicetick.com]
Sent: 05 December 2012 09:04
To: solr-user@lucene.apache.org
Subject: RE: Replication error and Shard Inconsistencies..

Hi Mark,

Thanks so much for the reply.

We are using the release version of 4.0..

It's very strange replication appears to be underway, but no files are being
copied across.  I have attached both the log from the new node that I tried
to bring up and the Schema and config we are using.

I think it's probably something weird with our config, so I'm going to play
around with it today.  If I make any progress I'll send an update.

Thanks again.

-----Original Message-----
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: 05 December 2012 00:04
To: solr-user@lucene.apache.org
Subject: Re: Replication error and Shard Inconsistencies..

Hey Annette,

Are you using Solr 4.0 final? A version of 4x or 5x?

Do you have the logs for when the replica tried to catch up to the leader?

Stopping and starting the node is actually a fine thing to do. Perhaps you
can try it again and capture the logs.

If a node is not listed as live but is in the clusterstate, that is fine. It
shouldn't be consulted. To remove it, you either have to unload it with the
core admin api or you could manually delete it's registered state under the
node states node that the Overseer looks at.

Also, it would be useful to see the logs of the new node coming up.there
should be info about what happens when it tries to replicate.

It almost sounds like replication is just not working for your setup at all
and that you have to tweak some configuration. You shouldn't see these nodes
as active then though - so we should get to the bottom of this.

- Mark

On Dec 4, 2012, at 4:37 AM, Annette Newton<annette.new...@servicetick.com>
wrote:

Hi all,

I have a quite weird issue with Solr cloud.  I have a 4 shard, 2
replica
setup, yesterday one of the nodes lost communication with the cloud setup,
which resulted in it trying to run replication, this failed, which has left
me with a Shard (Shard 4) that has one node with 2,833,940 documents on the
leader and 409,837 on the follower - obviously a big discrepancy and this
leads to queries returning differing results depending on which of these
nodes it gets the data from.  There is no indication of a problem on the
admin site other than the big discrepancy in the number of documents.  They
are all marked as active etc.

So I thought that I would force replication to happen again, by
stopping
and starting solr (probably the wrong thing to do) but this resulted in no
change.  So I turned off that node and replaced it with a new one.  In
zookeeper live nodes doesn't list that machine but it is still being shown
as active on in the ClusterState.json, I have attached images showing this.
This means the new node hasn't replaced the old node but is now a replica on
Shard 1!  Also that node doesn't appear to have replicated Shard 1's data
anyway, it didn't get marked with replicating or anything.

How do I clear the zookeeper state without taking down the entire solr
cloud setup?  How do I force a node to replicate from the others in the
shard?

Thanks in advance.

Annette Newton


<LiveNodes.zip>






--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/

Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.

Reply via email to