HI,

I am indexing documents to a 10 shard collection (testcollection, having no
replicas) in solr6 cluster using CloudSolrClient. I saw that there is a lot
of peer to peer document distribution going on when I looked at the solr
logs.

An example log statement is as follows:
2017-06-01 06:07:28.378 INFO  (qtp1358444045-3673692) [c:testcollection
s:shard8 r:core_node7 x:testcollection_shard8_replica1]
o.a.s.u.p.LogUpdateProcessorFactory [testcollection_shard8_replica1]
 webapp=/solr path=/update params={update.distrib=TOLEADER&distrib.from=
http://10.199.42.29:8983/solr/testcollection_shard7_replica1/&wt=javabin&version=2}{add=[BQECDwZGTCEBHZZBBiIP
(1568981383488995328), BQEBBQZB2il3wGT/0/mB (1568981383490043904),
BQEBBQZFnhOJRj+m9RJC (1568981383491092480), BQEGBgZIeBE1klHS4fxk
(1568981383492141056), BQEBBQZFVTmRx2VuCgfV (1568981383493189632)]} 0 25

When I went through the code of CloudSolrClient on grepcode I saw that the
client itself finds out which server it needs to hit by using the message
id hash and getting the shard range information from state.json.
Then it is quite confusing to me why there is a distribution of data
between peers as there is no replication and each shard is a leader.

I would like to know why this is happening and how to avoid it or if the
above log statement means something else and I am misinterpreting something.

-- 
Sathyam Doraswamy

Reply via email to