Hi,
I'm seeing the same issue on Solr 6.3 using HDFS and a replication factor of 3, even though I believe a replication factor of 1 should work the same. When I stop a Solr instance this is detected and Solr actually wants to create a replica on a different instance. The command for that does however fail:

o.a.s.c.OverseerAutoReplicaFailoverThread Exception trying to create new replica on http://...:9000/solr:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://...:9000/solr: Error CREATEing SolrCore 'test2.collection-09_shard1_replica1': Unable to create core [test2.collection-09_shard1_replica1] Caused by: No shard id for CoreDescriptor[name=test2.collection-09_shard1_replica1;instanceDir=/var/opt/solr/test2.collection-09_shard1_replica1] at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:593) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:262) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251) at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219) at org.apache.solr.cloud.OverseerAutoReplicaFailoverThread.createSolrCore(OverseerAutoReplicaFailoverThread.java:456) at org.apache.solr.cloud.OverseerAutoReplicaFailoverThread.lambda$addReplica$0(OverseerAutoReplicaFailoverThread.java:251)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Given that the data is on HDFS it shouldn't matter if any active replica is left as the data does not need to get transferred from another instance but the new core will just take over the existing data. Thus a replication factor of 1 should also work just in that case the shard would be down until the new core is up. Anyhow, it looks like the above call is missing to set the shard id I guess or some code is checking wrongly.

On 14.01.2017 02:44, Shawn Heisey wrote:
On 1/13/2017 5:46 PM, Chetas Joshi wrote:
One of the things I have observed is: if I use the collection API to
create a replica for that shard, it does not complain about the config
which has been set to ReplicationFactor=1. If replication factor was
the issue as suggested by Shawn, shouldn't it complain?
The replicationFactor value is used by exactly two things:  initial
collection creation, and autoAddReplicas.  It will not affect ANY other
command or operation, including ADDREPLICA.  You can create MORE
replicas than replicationFactor indicates, and there will be no error
messages or warnings.

In order to have a replica automatically added, your replicationFactor
must be at least two, and the number of active replicas in the cloud for
a shard must be less than that number.  If that's the case and the
expiration times have been reached without recovery, then Solr will
automatically add replicas until there are at least as many replicas
operational as specified in replicationFactor.

I would also like to mention that I experience some instance dirs
getting deleted and also found this open bug
(https://issues.apache.org/jira/browse/SOLR-8905)
The description on that issue is incomprehensible.  I can't make any
sense out of it.  It mentions the core.properties file, but the error
message shown doesn't talk about the properties file at all.  The error
and issue description seem to have nothing at all to do with the code
lines that were quoted.  Also, it was reported on version 4.10.3 ... but
this is going to be significantly different from current 6.x versions,
and the 4.x versions will NOT be updated with bugfixes.

Thanks,
Shawn


Reply via email to