Folks:

Question regarding SolrCloud Shard Number (Ex: shard<x>) & associated hash
ranges. We are in the process of identifying the best strategy to merge
shards that belong to collections that are chronologically older which sees
very low volume of searches compared to the collections with most recent
data.

What we ran into is that often times we find that Shard numbers are hash
ranges don’t necessarily correlate:

shard1: 80000000-aaa9ffff
shard2: aaaa0000-d554ffff
shard3: d5550000-ffffffff ( holds the last range )
shard4: 0-2aa9ffff ( holds the starting range )
shard5: 2aaa0000-5554ffff
shard6: 55550000-7fffffff


same goes for 'core_node<x>’ that does not follow order neither it
correlates with shard<x>. Meaning core_node<1> does not contain the keys
starting from 0 nor does it map to shard<1>.

{"shard1"=>
  {"range"=>"80000000-aaa9ffff",
    {"core_node5"=>
      "core"=>"post_NW_201508_shard1_replica1",
  "shard2"=>
    {"range"=>"aaaa0000-d554ffff",
      {"core_node6"=>
        "core"=>"post_NW_201508_shard2_replica1",
  "shard3"=>
    {"range"=>"d5550000-ffffffff",
      {"core_node2"=>
        "core"=>"post_NW_201508_shard3_replica1",
  "shard4"=>
    {"range"=>"0-2aa9ffff",
      {"core_node3"=>
        "core"=>"post_NW_201508_shard4_replica1",
  "shard5"=>
    {"range"=>"2aaa0000-5554ffff",
      {"core_node4"=>
        "core"=>"post_NW_201508_shard5_replica1",
  "shard6"=>
    {"range"=>"55550000-7fffffff",
      {"core_node1"=>
        "core"=>"post_NW_201508_shard6_replica1"


Why would this be a concern ?

   1. Lets say if we merge the indexes of adjacent shards (to reduce the
   number of shards in the collection). In this case it will be merging
   "core_node3: 0-2aa9ffff” & "core_node4: 2aaa0000-5554ffff” . What would the
   index of the new core_node directory ? core_node<?>
   2. When we copy this data over to the cluster after recreating the
   collection with reduced number of shards, how would the cluster infer the
   hash range from the index data or how does it reconcile with the metadata
   about the shards in the local filesystem of cluster nodes.
   3. How should we approach this problem to guarantee Solr picks up the
   right key order from the merged indexes ?



*Solr 4.4*
*HDFS for Index Storage*

Reply via email to