Ok,  it is clearer now.
You have 9 solr nodes running,  one per physical machine.
So each node has a number cores ( both replicas and leaders).
When the node died,  you got a lot of indexes corrupted.
I still miss why you restarted the others 8 working nodes ( I was expecting
you to restart only the failed one)

When you mention that only one replica  is failing,  you mean that the solr
node is up and running and only  one solr core ( the replica of one shard)
 keeps failing?
Or all the local cores in that node are failing  to recover?

Cheers

On 1 Feb 2017 6:07 p.m., "Joe Obernberger" <joseph.obernber...@gmail.com>
wrote:

Thank you for the response.
There are no virtual machines in the configuration.  The collection has 45
shards with 3 replicas each spread across the 9 physical boxes; each box is
running one copy of solr.  I've tried to restart just the one node after
the other 8 (and all their shards/replicas) came up, but this one replica
seems to be in perma-recovery.

Shard Count: 45
replicationFactor: 3
maxShardsPerNode: 50
router: compositeId
autoAddReplicas: false

SOLR_JAVA_MEM options are -Xms16g - Xmx32g

_TUNE is:
"-XX:+UseG1GC \
-XX:MaxDirectMemorySize=8g
-XX:+PerfDisableSharedMem \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=32m \
-XX:MaxGCPauseMillis=500 \
-XX:InitiatingHeapOccupancyPercent=75 \
-XX:ParallelGCThreads=16 \
-XX:+UseLargePages \
-XX:-ResizePLAB \
-XX:+AggressiveOpts"

So far it has retried 22 times.  The cluster is accessible and OK, but I'm
afraid to continue indexing data if this one node will never come back.
Thanks for help!

-Joe



On 2/1/2017 12:58 PM, alessandro.benedetti wrote:

> Let me try to summarize .
> How many virtual machines on top of the 9 physical ?
> How many Solr processes ( replicas ? )
>
> If you had 1 node compromised.
> I assume you have replicas as well right ?
>
> Can you explain a little bit better your replicas configuration ?
> Why you had to stop all the nodes ?
>
> I would expect the stop of the solr node failing, cleanup of the index and
> restart.
> Automatically it would recover from the leader.
>
> Something is suspicious here, let us know !
>
> Cheers
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: http://lucene.472066.n3.nabble
> .com/Solr-6-3-0-recovery-failed-tp4318324p4318327.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Reply via email to