On 10/15/2014 9:26 PM, S.L wrote:
Look at the logging information I provided below , looks like the results
are only being returned back for this solrCloud cluster  if the request
goes to one of the two replicas of a shard.

I have verified that numDocs in the replicas for a given shard is same but
there is difference in the maxDoc and deletedDocs, does this signal the
replicas being out of sync ?

Even if the numDocs are same , how do we guarantee that those docs are
identical and have the same uniquekeys , is there a way to verify this ? I
am suspecting that  as the numDocs is same across the replicas , and still
only when the request goes to one of  the  replicas of the shard that I get
a result back , the documents with in those replicas with in a shard are
not an exact replica set of each other.

I suspect the issue I am facing in 4.10.1 cloud is related to
https://issues.apache.org/jira/browse/SOLR-4924  .

Can anyone please let me know , how to solve this issue of intermittent no
results for a query ?

query with no results hits these cores:
server 2 shard 3 replica1
server 3 shard 1 replica 1
server 1 shard 2 replica 1

query with 1 result hits these cores:
server 2 shard 1 replica 2
server 3 shard 2 replica 2 (found 1)
server 1 shard 3 replica 2

Here's some URLs for some testing. They are directed at specific shard replicas and are specifically NOT distributed queries:

http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/select?q=*:*&fq=id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb&distrib=false

http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/select?q=*:*&fq=id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb&distrib=false

If you run these queries (replacing server names and the /select request handler as appropriate), do you get 0 results on the first one and 1 result on the second one? If you do, then you've definitely got replicas out of sync. If you get 1 result on both queries, then something else is breaking. If by chance you have taken steps to fix this particular ID, pick another one that you know has a problem.

There is no automated way to detect replicas out of sync. You could request all docs on both replicas with distrib=false&fl=id&sort=id+asc, then compare the two lists. Depending on how many docs you have, those queries could take a while to run.

If the replicas are out of sync, are there any ERROR entries in the Solr log, especially at the time that the problem docs were indexed?

Thanks,
Shawn

Reply via email to