Re: Solr on HDFS: AutoAddReplica does not add a replica

Hendrik Haddorp Tue, 21 Feb 2017 13:12:54 -0800

Hi,

I had opened SOLR-10092(https://issues.apache.org/jira/browse/SOLR-10092) for this a while ago.I was now able to gt this feature working with a very small code change.After a few seconds Solr reassigns the replica to a different Solrinstance as long as one replica is still up. Not really sure why onereplica needs to be up though. I added the patch based on Solr 6.3 tothe bug report. Would be great if it could be merged soon.


regards,
Hendrik

On 19.01.2017 17:08, Hendrik Haddorp wrote:

HDFS is like a shared filesystem so every Solr Cloud instance canaccess the data using the same path or URL. The clusterstate.jsonlooks like this:
"shards":{"shard1":{
        "range":"80000000-7fffffff",
        "state":"active",
        "replicas":{
          "core_node1":{
            "core":"test1.collection-0_shard1_replica1",
"dataDir":"hdfs://master...:8000/test1.collection-0/core_node1/data/",
            "base_url":"http://slave3....:9000/solr";,
            "node_name":"slave3....:9000_solr",
            "state":"active",
"ulogDir":"hdfs://master....:8000/test1.collection-0/core_node1/data/tlog"},
          "core_node2":{
            "core":"test1.collection-0_shard1_replica2",
"dataDir":"hdfs://master....:8000/test1.collection-0/core_node2/data/",
            "base_url":"http://slave2....:9000/solr";,
            "node_name":"slave2....:9000_solr",
            "state":"active",
"ulogDir":"hdfs://master....:8000/test1.collection-0/core_node2/data/tlog",
            "leader":"true"},
          "core_node3":{
            "core":"test1.collection-0_shard1_replica3",
"dataDir":"hdfs://master....:8000/test1.collection-0/core_node3/data/",
            "base_url":"http://slave4....:9005/solr";,
            "node_name":"slave4....:9005_solr",
            "state":"active",
"ulogDir":"hdfs://master....:8000/test1.collection-0/core_node3/data/tlog"}}}}
So every replica is always assigned to one node and this is beingstored in ZK, pretty much the same as for none HDFS setups. Just asthe data is not stored locally but on the network and as the path doesnot contain any node information you can of course easily take overthe work to a different Solr node. You should just need to update theowner of the replica in ZK and you should basically be done, I assume.That's why the documentation states that an advantage of using HDFS isthat a failing node can be replaced by a different one. The Overseerjust has to move the ownership of the replica, which seems like whatthe code is trying to do. There just seems to be a bug in the code sothat the core does not get created on the target node.
Each data directory also contains a lock file. The documentationstates that one should use the HdfsLockFactory, which unfortunatelycan easily lead to SOLR-8335, which hopefully will be fixed bySOLR-8169. A manual cleanup is however also easily done but seems torequire a node restart to take effect. But I'm also only recentlyplaying around with all this ;-)
regards,
Hendrik

On 19.01.2017 16:40, Shawn Heisey wrote:
On 1/19/2017 4:09 AM, Hendrik Haddorp wrote:
Given that the data is on HDFS it shouldn't matter if any active
replica is left as the data does not need to get transferred from
another instance but the new core will just take over the existing
data. Thus a replication factor of 1 should also work just in that
case the shard would be down until the new core is up. Anyhow, it
looks like the above call is missing to set the shard id I guess or
some code is checking wrongly.
I know very little about how SolrCloud interacts with HDFS, so although
I'm reasonably certain about what comes below, I could be wrong.

I have not ever heard of SolrCloud being able to automatically take over
an existing index directory when it creates a replica, or even share
index directories unless the admin fools it into doing so without its
knowledge.  Sharing an index directory for replicas with SolrCloud would
NOT work correctly.  Solr must be able to update all replicas
independently, which means that each of them will lock its index
directory and write to it.

It is my understanding (from reading messages on mailing lists) that
when using HDFS, Solr replicas are all separate and consume additional
disk space, just like on a regular filesystem.

I found the code that generates the "No shard id" exception, but my
knowledge of how the zookeeper code in Solr works is not deep enough to
understand what it means or how to fix it.

Thanks,
Shawn

Re: Solr on HDFS: AutoAddReplica does not add a replica

Reply via email to