Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-02-22 Thread Hendrik Haddorp
I'm also not really an HDFS expert but I believe it is slightly different: The HDFS data is replicated, lets say 3 times, between the HDFS data nodes but for an HDFS client it looks like one directory and it is hidden that the data is replicated. Every client should see the same data. Just

Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-02-22 Thread Erick Erickson
bq: in the none HDFS case that sounds logical but in the HDFS case all the index data is in the shared HDFS file system That's not really the point, and it's not quite true. The Solr index unique _per replica_. So replica1 points to an HDFS directory (that's triply replicated to be sure).

Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-02-21 Thread Hendrik Haddorp
Hi Erick, in the none HDFS case that sounds logical but in the HDFS case all the index data is in the shared HDFS file system. Even the transaction logs should be in there. So the node that once had the replica should not really have more information then any other node, especially if

Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-02-21 Thread Erick Erickson
Hendrik: bq: Not really sure why one replica needs to be up though. I didn't write the code so I'm guessing a bit, but consider the situation where you have no replicas for a shard up and add a new one. Eventually it could become the leader but there would have been no chance for it to check if

Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-02-21 Thread Hendrik Haddorp
Hi, I had opened SOLR-10092 (https://issues.apache.org/jira/browse/SOLR-10092) for this a while ago. I was now able to gt this feature working with a very small code change. After a few seconds Solr reassigns the replica to a different Solr instance as long as one replica is still up. Not

Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-01-19 Thread Hendrik Haddorp
HDFS is like a shared filesystem so every Solr Cloud instance can access the data using the same path or URL. The clusterstate.json looks like this: "shards":{"shard1":{ "range":"8000-7fff", "state":"active", "replicas":{ "core_node1":{

Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-01-19 Thread Shawn Heisey
On 1/19/2017 4:09 AM, Hendrik Haddorp wrote: > Given that the data is on HDFS it shouldn't matter if any active > replica is left as the data does not need to get transferred from > another instance but the new core will just take over the existing > data. Thus a replication factor of 1 should

Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-01-19 Thread Hendrik Haddorp
Hi, I'm seeing the same issue on Solr 6.3 using HDFS and a replication factor of 3, even though I believe a replication factor of 1 should work the same. When I stop a Solr instance this is detected and Solr actually wants to create a replica on a different instance. The command for that does

Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-01-13 Thread Shawn Heisey
On 1/13/2017 5:46 PM, Chetas Joshi wrote: > One of the things I have observed is: if I use the collection API to > create a replica for that shard, it does not complain about the config > which has been set to ReplicationFactor=1. If replication factor was > the issue as suggested by Shawn,

Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-01-13 Thread Chetas Joshi
Erick, I have not changed any config. I have autoaddReplica = true for individual collection config as well as the overall cluster config. Still, it does not add a replica when I decommission a node. Adding a replica is overseer's job. I looked at the logs of the overseer of the solrCloud but

Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-01-12 Thread Erick Erickson
Hmmm, have you changed any of the settings for autoAddReplcia? There are several parameters that govern how long before a replica would be added. But I suggest you use the Cloudera resources for this question, not only did they write this functionality, but Cloudera support is deeply embedded in

Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-01-12 Thread Shawn Heisey
On 1/11/2017 7:14 PM, Chetas Joshi wrote: > This is what I understand about how Solr works on HDFS. Please correct me > if I am wrong. > > Although solr shard replication Factor = 1, HDFS default replication = 3. > When the node goes down, the solr server running on that node goes down and > hence

Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-01-11 Thread Shawn Heisey
On 1/11/2017 1:47 PM, Chetas Joshi wrote: > I have deployed a SolrCloud (solr 5.5.0) on hdfs using cloudera 5.4.7. The > cloud has 86 nodes. > > This is my config for the collection > > numShards=80 > ReplicationFactor=1 > maxShardsPerNode=1 > autoAddReplica=true > > I recently decommissioned a

Solr on HDFS: AutoAddReplica does not add a replica

2017-01-11 Thread Chetas Joshi
Hello, I have deployed a SolrCloud (solr 5.5.0) on hdfs using cloudera 5.4.7. The cloud has 86 nodes. This is my config for the collection numShards=80 ReplicationFactor=1 maxShardsPerNode=1 autoAddReplica=true I recently decommissioned a node to resolve some disk issues. The shard that was