RE: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.

Marquiss, John Mon, 06 Mar 2017 19:55:20 -0800

I couldn't find an issue for this in JIRA so I thought I would add some of our 
own findings here... We are seeing the same problem with the Solr 6 Restore 
functionality. While I do not think it is important it happens on both our 
Linux environments and our local Windows development environments. Also, from 
our testing, I do not think it has anything to do with actual indexing (if you 
notice in the order of my test steps documents appear in replicas after 
creation, without re-indexing).


Test Environment:
•       Windows 10 (we see the same behavior on Linux as well)
•       Java 1.8.0_121
•       Solr 6.3.0 with patch for SOLR-9527 (To fix RESTORE shard distribution 
and add createNodeSet to RESTORE)
•       1 Zookeeper node running on localhost:2181
•       3 Solr nodes running on localhost:8171, localhost:8181 and 
localhost:8191 (hostname NY07LP521696)

Test and observations:
1)      Create a 2 shard collection 'test'
        
http://localhost:8181/solr/admin/collections?action=CREATE&name=test&numShards=2&replicationFactor=1&maxShardsPerNode=1&collection.configName=testconf&&createNodeSet=NY07LP521696:8171_solr,NY07LP521696:8181_solr

2)      Index 7 documents to 'test'
3)      Search 'test' - result count 7
4)      Backup collection 'test'
        
http://localhost:8181/solr/admin/collections?action=BACKUP&collection=test&name=copy&location=%2FData%2Fsolr%2Fbkp&async=1234

5)      Restore 'test' to collection 'test2'
        
http://localhost:8191/solr/admin/collections?action=RESTORE&name=copy&location=%2FData%2Fsolr%2Fbkp&collection=test2&async=1234&maxShardsPerNode=1&createNodeSet=NY07LP521696:8181_solr,NY07LP521696:8191_solr

6)      Search 'test2' - result count 7
7)      Index 2 new documents to 'test2'
8)      Search 'test2' - result count 7 (new documents do not appear in results)
9)      Create a replica for each of the shards of 'test2'
        
http://localhost:8191/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard1&node=NY07LP521696:8181_solr
        
http://localhost:8191/solr/admin/collections?action=ADDREPLICA&collection=test2&shard=shard2&node=NY07LP521696:8171_solr

*** Note that it is not necessary to try to re-index the 2 new documents before 
this step, just create replicas and query ***
10)     Repeatedly query 'test2' - result count randomly changes between 7, 8 
and 9. This is because Solr is randomly selecting replicas of 'test2' and one 
of the two new docs were added to each of the shards in the collection so if 
replica0 of both shards are selected the result is 7, if replica0 and replica1 
are selected for each of either shard the result is 8 and if replica1 is 
selected for both shards the result is 9. This is random behavior because we do 
not know ahead of time which shards the new documents will be added to and if 
they will be split evenly.

        Query 'test2' with shards parameter of original restored shards - 
result count 7
        
http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica0,localhost:8181/solr/test2_shard2_replica0

        Query 'test2' with shard parameter of one original restored shard and 
one replica shard - result count 8
        
http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica0,localhost:8181/solr/test2_shard2_replica1
        
http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica1,localhost:8181/solr/test2_shard2_replica0
        
        Query 'test2' with shards parameter of replica shards - result count 9
        
http://localhost:8181/solr/test2/select?q=*:*&shards=localhost:8181/solr/test2_shard1_replica1,localhost:8181/solr/test2_shard2_replica1

13)     Note that on the Solr admin Core statistics show the restored cores as 
not current, the Searching master is Gen 2, the Replicable master is Gen 3, on 
the replicated core both the Searching and Replicable master is Gen 3
14)     Restarting Solr corrects the issue

Thoughts:
•       Solr is backing up and restoring correctly
•       The restored collection data is stored under a path like: 
…/node8181/test2_shard1_replica0/restore.20170307005909295 instead of 
…/node8181/test2_shard1_replica0/index
•       Indexing is actually behaving correctly (documents are available in 
replicas even without re-indexing)
•       When asked to about the state of the searcher though the admin page 
core details Solr does know that the searcher is not current

I was looking in the source but haven’t found the root cause yet. My gut 
feeling is that because the index data dir is …/restore.20170307005909295 
instead of …/index Solr isn't seeing the index changes and recycling the 
searcher for the restored cores. Neither committing the collection or forcing 
an optimize fix the issue, restarting Solr fixes the issue but this will not be 
viable for us in production.

John Marquiss

-----Original Message-----
>From: Jerome Yang [mailto:jey...@pivotal.io] 
>Sent: Tuesday, October 11, 2016 9:23 PM
>To: solr-user@lucene.apache.org; erickerick...@gmail.com
>Subject: Re: Solrcloud after restore collection, when index new documents into 
>restored collection, leader not write to index.
>
>@Erick Please help😂
>
>On Wed, Oct 12, 2016 at 10:21 AM, Jerome Yang <jey...@pivotal.io> wrote:
>
>> Hi Shawn,
>>
>> I just check the clusterstate.json
>> <http://192.168.33.10:18983/solr/admin/zookeeper?detail=true&path=%2Fc
>> lusterstate.json> which is restored for "restore_test_collection".
>> The router is "router":{"name":"compositeId"}, not implicit.
>>
>> So, it's a very serious bug I think.
>> Should this bug go into jira?
>>
>> Please help!
>>
>> Regards,
>> Jerome
>>
>>
>> On Tue, Oct 11, 2016 at 8:34 PM, Shawn Heisey <apa...@elyograg.org> wrote:
>>
>>> On 10/11/2016 3:27 AM, Jerome Yang wrote:
>>> > Then, I index some new documents, and commit. I find that the 
>>> > documents are all indexed in shard1 and the leader of shard1 don't 
>>> > have these new documents but other replicas do have these new documents.
>>>
>>> Not sure why the leader would be missing the documents but other 
>>> replicas have them, but I do have a theory about why they are only in 
>>> shard1.  Testing that theory will involve obtaining some information 
>>> from your system:
>>>
>>> What is the router on the restored collection? You can see this in 
>>> the admin UI by going to Cloud->Tree, opening "collections", and 
>>> clicking on the collection.  In the right-hand side, there will be 
>>> some info from zookeeper, with some JSON below it that should mention 
>>> the router.  I suspect that the router on the new collection may have 
>>> been configured as implicit, instead of compositeId.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>

RE: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.

Reply via email to