Re: Solr on HDFS: AutoAddReplica does not add a replica

Hendrik Haddorp Thu, 19 Jan 2017 03:09:39 -0800

Hi,

I'm seeing the same issue on Solr 6.3 using HDFS and a replicationfactor of 3, even though I believe a replication factor of 1 should workthe same. When I stop a Solr instance this is detected and Solr actuallywants to create a replica on a different instance. The command for thatdoes however fail:

o.a.s.c.OverseerAutoReplicaFailoverThread Exception trying to create newreplica onhttp://...:9000/solr:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error from server at http://...:9000/solr: Error CREATEing SolrCore'test2.collection-09_shard1_replica1': Unable to create core[test2.collection-09_shard1_replica1] Caused by: No shard id forCoreDescriptor[name=test2.collection-09_shard1_replica1;instanceDir=/var/opt/solr/test2.collection-09_shard1_replica1]atorg.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:593)atorg.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:262)atorg.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251)atorg.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)atorg.apache.solr.cloud.OverseerAutoReplicaFailoverThread.createSolrCore(OverseerAutoReplicaFailoverThread.java:456)atorg.apache.solr.cloud.OverseerAutoReplicaFailoverThread.lambda$addReplica$0(OverseerAutoReplicaFailoverThread.java:251)

    at java.util.concurrent.FutureTask.run(FutureTask.java:266)

atorg.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

    at java.lang.Thread.run(Thread.java:745)

Given that the data is on HDFS it shouldn't matter if any active replicais left as the data does not need to get transferred from anotherinstance but the new core will just take over the existing data. Thus areplication factor of 1 should also work just in that case the shardwould be down until the new core is up. Anyhow, it looks like the abovecall is missing to set the shard id I guess or some code is checkingwrongly.


On 14.01.2017 02:44, Shawn Heisey wrote:

On 1/13/2017 5:46 PM, Chetas Joshi wrote:

One of the things I have observed is: if I use the collection API to
create a replica for that shard, it does not complain about the config
which has been set to ReplicationFactor=1. If replication factor was
the issue as suggested by Shawn, shouldn't it complain?

The replicationFactor value is used by exactly two things:  initial
collection creation, and autoAddReplicas.  It will not affect ANY other
command or operation, including ADDREPLICA.  You can create MORE
replicas than replicationFactor indicates, and there will be no error
messages or warnings.

In order to have a replica automatically added, your replicationFactor
must be at least two, and the number of active replicas in the cloud for
a shard must be less than that number.  If that's the case and the
expiration times have been reached without recovery, then Solr will
automatically add replicas until there are at least as many replicas
operational as specified in replicationFactor.

I would also like to mention that I experience some instance dirs
getting deleted and also found this open bug
(https://issues.apache.org/jira/browse/SOLR-8905)

The description on that issue is incomprehensible.  I can't make any
sense out of it.  It mentions the core.properties file, but the error
message shown doesn't talk about the properties file at all.  The error
and issue description seem to have nothing at all to do with the code
lines that were quoted.  Also, it was reported on version 4.10.3 ... but
this is going to be significantly different from current 6.x versions,
and the 4.x versions will NOT be updated with bugfixes.

Thanks,
Shawn

Re: Solr on HDFS: AutoAddReplica does not add a replica

Reply via email to